Swapping a software RAID1 disk on Desktop Linux

A brief history of mine:

System

  • OpenSuSe 13.2 with KDE (not that it matters)
  • Operating system & boot partition on /dev/sdc
  • 2 TB disks /dev/sda and /dev/sdb in Software RAID1 (mirrored, not striped) configuration.
  • Two partitions on the RAID disks, one being /home the other unmounted

Symptoms

  • All desktop programs became frequently unresponsive (lagged) for up to several seconds, then resumed.
  • Keystrokes & mouse actions were buffered.

Trouble-shooting

  • top showed IO-wait (wa -value) to be often in the range 24-74 % (or more), while usr value was around 0.1%. This meant a hang.
  • iotop (in superuser mode) indicated no particular process responsible for this
  • iostat -Xn 1 (in su mode) showed that /dev/sdb was having await (average wait times) about 100 times that of /dev/sda

That pretty much nailed it: The second hard disk in the RAID1 array was getting old. Well, that’s what RAID1 is for.

Solution

  • Fortunately I had a 1TB disk lying around, just waiting for this.
  • Power off, unplug sdb, plug new disk to that. (NOTE: using a marker to physically mark “sda”/”sdb” would have saved a couple of reboots here…)
  • Boot. Go to superuser mode. N.B. Copying the following commands as is WILL DESTROY your system, if it – and your problem – is unlike mine. Change devices & partitions appropriately. (Also, if you have more than 4 partitions, check the behavior of sfdisk – copying logical partitions might differ, I don’t know.
    fdisk -d /dev/sda > part_table //copy partition table from sda to a file
    sfdisk /dev/sdb < part_table //Write that table to sdb
    mdadm /dev/md0 --add /dev/sdb2 //Add partition sdb2 to RAID stack 1
  • Leave the computer on to wait for the disk to be cloned.

Problem solved πŸ™‚

I take no responsibility of results, but I hope this helps someone forward πŸ™‚

EDIT:

I was asked if I ran cat /proc/mdstat at some point. I did, at the end of the troubleshooting phase, to check the role of sdb in the RAID array.

Also, I was pointed out that simply cloning the partition table would be dangerous, in case the replacement disk wasn’t identical. I had considered this when I bought the disk, and had picked a third identical disk to the other two.

Also, I had taken a full backup of /home before I started…

A friend of mine gave some more good advice:

One thing mdstat is useful for is showing when your sync is finished (and so when it’ll be safe to turn your computer off again). As for the partition table, I’d suggest just creating one from scratch according to the device’s requirements regardless of what the new disk is like. Cloning it doesn’t accomplish anything useful and can seriously mess up your data if you get it wrong.

Leave a comment

Your email address will not be published. Required fields are marked *