Difference between revisions of "RAID-1 configuration and management"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 28: Line 28:
 
Well, all data are still available on the other disk, so they can be restored when a new disk is placed. The following steps show how to restore the data:
 
Well, all data are still available on the other disk, so they can be restored when a new disk is placed. The following steps show how to restore the data:
  
0. '''Do not reboot the machine!''' A reboot may change the drive names (e.g., /dev/sdb becoming /dev/sda) and hang the machine if it cannot find the boot loader grub. If this happens, you may be saved by the section ''Restoring the boot loader'' (but do not count on that!).
+
0. '''Do not reboot the machine!''' A reboot may change the drive names (e.g., /dev/sdb becoming /dev/sda) and hang the machine if it cannot find the boot loader grub. If this happens, you may be saved by the section ''Restoring the boot loader''.
  
 
1. Remove the partitions of the defect disk from the RAID configuration:
 
1. Remove the partitions of the defect disk from the RAID configuration:

Revision as of 14:55, 6 November 2006

Software RAID-1 configuration via Kickstart

Manually change the Kickstart file to define software RAID-1, which is also known as mirroring.

The following example uses two serial ATA disks (/dev/sda and /dev/sdb) with four partitions (/boot, /, swap and /tmp), each in software RAID-1 configuration:

part raid.01 --size=128  --ondisk=sda
part raid.02 --size=8192 --ondisk=sda
part raid.03 --size=3072 --ondisk=sda
part raid.04 --size=512  --ondisk=sda

part raid.05 --size=128  --ondisk=sdb
part raid.06 --size=8192 --ondisk=sdb
part raid.07 --size=3072 --ondisk=sdb
part raid.08 --size=512  --ondisk=sdb

raid /boot --level=RAID1 --device=md0 --fstype=ext2 raid.01 raid.05
raid /     --level=RAID1 --device=md1 --fstype=ext3 raid.02 raid.06
raid swap  --level=RAID1 --device=md2 --fstype=swap raid.03 raid.07
raid /tmp  --level=RAID1 --device=md3 --fstype=ext3 raid.04 raid.08

Please note that the grub boot loader is only installed on the first disk. Manually install it on the second disk, as described in section Restoring the boot loader.

Restoring data on a new disk

OK, so there are two disks in a RAID-1 (mirror) configuration. What to do if on of them dies?

Well, all data are still available on the other disk, so they can be restored when a new disk is placed. The following steps show how to restore the data:

0. Do not reboot the machine! A reboot may change the drive names (e.g., /dev/sdb becoming /dev/sda) and hang the machine if it cannot find the boot loader grub. If this happens, you may be saved by the section Restoring the boot loader.

1. Remove the partitions of the defect disk from the RAID configuration:

mdadm /dev/mdX -r /dev/sdYZ            (X=0,1,2,... Y=a,b,c,... Z=1,2,3,...)

For example,

mdadm /dev/md0 -r /dev/sdb1

to remove partition 1 on disk sdb from raid device md0.

2. Replace the bad disk by a fresh one. Do not reboot the machine!

3. Rescan the SCSI bus for new devices. Use the script

/usr/local/bin/rescan-sci-bus.sh

(if installed from rpm), or download it from:

http://stal.nikhef.nl/scripts/rescan-scsi-bus.sh

4. Now the partition table should be created on the new disk. Use the following command to clone the partition table of existing disk sdX to the new disk sdY:

sfdisk -d /dev/sdX | \
sed -e s/sdX/sdY/ | \
sdisk /dev/sdY

(using X=a,b,c,.. and Y=a,b,c,..., X different from Y).

5. Add all partitions of the new disk to the corresponding raid devices:

mdadm /dev/mdX -a /dev/sdYZ            (X=0,1,2,... Y=a,b,c,... Z=1,2,3,...)

For example,

mdadm /dev/md0 -a /dev/sdb1

to add partition 1 on disk sdb from raid device md0. This will automatically trigger the synchronization of the data on that partition to the one on the new disk. All above command may immediately be repeated for all partitions; the actual synchronization takes place sequentially.

6. The progress of the sycnhronization can be monitored via the following command:

cat /proc/mdstat

which produces output like:

Personalities : [raid1]
read_ahead 1024 sectors
Event: 23
md0 : active raid1 sda1[0] sdb1[1]
      128384 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      8385856 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      3148672 blocks [2/2] [UU]

md3 : active raid1 sda5[2] sdb5[1]
      521984 blocks [2/1] [_U]
      [==============>......]  recovery = 73.5% (384524/521984) finish=0.0min speed=54932K/sec
unused devices: <none>

Note: By default, the transfer rate for synchronizing data is ~10 MB/s. If the system is not running other processes or services, the rate may be increased. The following example sets the maximum speed to 100 MB/s:

echo 100000 > /proc/sys/dev/raid/speed_limit_max


The actual status of a RAID device can be obtained with the following command:

mdadm -Q -D /dev/mdX                    (X=0,1,2,...)

Restoring the boot loader

WARNING: The following has not been tested extensively. Your mileage may vary! There are probably better ways to do this. Please update this section if you know any.

The boot loader (i.e., grub) is not restored on the new disk. If new disk is the first one in the system, start /sbin/grub and give the following input at the prompt (first line split for readability):

install --stage2=/boot/grub/stage2 (hd0,0)/grub/stage1 (hd0) \
(hd0,0)/grub/stage2 p (hd0,0)/grub/grub.conf

quit

However, if the second disk is replaced, a similar command can be issued:

install --stage2=/boot/grub/stage2 (hd1,0)/grub/stage1 (hd1) \
(hd1,0)/grub/stage2 p (hd1,0)/grub/grub.conf

Unfortunately, this does not make the second disk boot by itself. If the first disk later crashes, and the system is rebooted (despite the warnings given above), the grub prompt may appear:

grub>

Entering the following may make the disk bootable and reboot the machine:

setup (hd0)

reboot

Note that there is now only one drive in the system, which is addressed as (hd0) instead of (hd1).