Difference between revisions of "RAID-1 configuration and management"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 125: Line 125:
  
 
== Shuffling data and partitions around ==
 
== Shuffling data and partitions around ==
 +
 +
If, for some reason, you want to mess with the partitions on your machine (because your once so carefully chosen partitioning scheme doesn't fit the curren requirements) you can do this without needing to reinstall everything--if you're careful.
 +
 +
The RAID-1 configuration means having two identical copies of your data. What is there to stop you from:
 +
 +
1) breaking up your RAID set into individual disks
 +
2) using one disk to fashion up a new partitioning scheme (possibly destroying some data)
 +
3) recovering data from the second disk to the first disk
 +
4) matching the second disk's partitioning scheme with the first disk
 +
5) recreating a new RAID-1 set starting from the first.
 +
 +
This is not for the faint of heart.

Revision as of 08:18, 25 April 2007

Software RAID-1 configuration via Kickstart

Manually change the Kickstart file to define software RAID-1, which is also known as mirroring.

The following example uses two serial ATA disks (/dev/sda and /dev/sdb) with four partitions (/boot, /, swap and /tmp), each in software RAID-1 configuration:

part raid.01 --size=128  --ondisk=sda
part raid.02 --size=8192 --ondisk=sda
part raid.03 --size=3072 --ondisk=sda
part raid.04 --size=512  --ondisk=sda

part raid.05 --size=128  --ondisk=sdb
part raid.06 --size=8192 --ondisk=sdb
part raid.07 --size=3072 --ondisk=sdb
part raid.08 --size=512  --ondisk=sdb

raid /boot --level=RAID1 --device=md0 --fstype=ext2 raid.01 raid.05
raid /     --level=RAID1 --device=md1 --fstype=ext3 raid.02 raid.06
raid swap  --level=RAID1 --device=md2 --fstype=swap raid.03 raid.07
raid /tmp  --level=RAID1 --device=md3 --fstype=ext3 raid.04 raid.08

In addition, the clearpart entry in the kickstart file should clear all partitions (at least on CentOS 3):

clearpart --all

Clearing the partition tables on the individual drives (--drives=sda,sdb) leads to errors in Anaconda (the RedHat system installer).

Please note that the grub boot loader is only installed on the first disk. Manually install it on the second disk, as described in section Installing the boot loader.

Restoring data on a new disk

OK, so there are two disks in a RAID-1 (mirror) configuration. What to do if on of them dies?

Well, all data are still available on the other disk, so they can be restored when a new disk is placed. The following steps show how to restore the data:

0. Do not reboot the machine! A reboot may change the drive names (e.g., /dev/sdb becoming /dev/sda) and hang the machine if it cannot find the boot loader grub. If this happens, you may be saved by the section Restoring the boot loader.

1. Remove the partitions of the defect disk from the RAID configuration:

mdadm /dev/mdX -r /dev/sdYZ            (X=0,1,2,... Y=a,b,c,... Z=1,2,3,...)

For example,

mdadm /dev/md0 -r /dev/sdb1

to remove partition 1 on disk sdb from raid device md0.

2. Replace the bad disk by a fresh one. Do not reboot the machine!

3. Rescan the SCSI bus for new devices. Use the script

/usr/local/bin/rescan-sci-bus.sh

(if installed from rpm), or download it from:

http://stal.nikhef.nl/scripts/rescan-scsi-bus.sh

4. Now the partition table should be created on the new disk. Use the following command to clone the partition table of existing disk sdX to the new disk sdY:

sfdisk -d /dev/sdX | \
sed -e s/sdX/sdY/ | \
sdisk /dev/sdY

(using X=a,b,c,.. and Y=a,b,c,..., X different from Y).

5. Add all partitions of the new disk to the corresponding raid devices:

mdadm /dev/mdX -a /dev/sdYZ            (X=0,1,2,... Y=a,b,c,... Z=1,2,3,...)

For example,

mdadm /dev/md0 -a /dev/sdb1

to add partition 1 on disk sdb from raid device md0. This will automatically trigger the synchronization of the data on that partition to the one on the new disk. All above command may immediately be repeated for all partitions; the actual synchronization takes place sequentially.

6. The progress of the sycnhronization can be monitored via the following command:

cat /proc/mdstat

which produces output like:

Personalities : [raid1]
read_ahead 1024 sectors
Event: 23
md0 : active raid1 sda1[0] sdb1[1]
      128384 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      8385856 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      3148672 blocks [2/2] [UU]

md3 : active raid1 sda5[2] sdb5[1]
      521984 blocks [2/1] [_U]
      [==============>......]  recovery = 73.5% (384524/521984) finish=0.0min speed=54932K/sec
unused devices: <none>

Note: By default, the transfer rate for synchronizing data is ~10 MB/s. If the system is not running other processes or services, the rate may be increased. The following example sets the maximum speed to 100 MB/s:

echo 100000 > /proc/sys/dev/raid/speed_limit_max


The actual status of a RAID device can be obtained with the following command:

mdadm -Q -D /dev/mdX                    (X=0,1,2,...)


Finally, the boot loader should be installed on the new disk. This is described in the section Installing the bootloader.

Installing the boot loader

The steps described in the section Restoring data on a new disk do not install a boot loader (i.e., grub) on the new disk. If a system with a new disk is rebooted without installing the bootloader, the system may be unable to boot. Therefore, always

To install the bootloader on a disk /dev/sdX, use the procedure described below.

1. Start the utility /sbin/grub

2. At grub the prompt, issue the following commands:

device (hd0) /dev/sdX
root (hd0,0)
setup (hd0)
quit

A script that takes care of these steps is available via the following link:

http://stal.nikhef.nl/scripts/install_grub.sh

The script installs the boot loader on all partitions in RAID device /dev/md0 (which is assumed to be the boot device). Running this script more than once is harmless if no other means of configurating grub are used.


Shuffling data and partitions around

If, for some reason, you want to mess with the partitions on your machine (because your once so carefully chosen partitioning scheme doesn't fit the curren requirements) you can do this without needing to reinstall everything--if you're careful.

The RAID-1 configuration means having two identical copies of your data. What is there to stop you from:

1) breaking up your RAID set into individual disks 2) using one disk to fashion up a new partitioning scheme (possibly destroying some data) 3) recovering data from the second disk to the first disk 4) matching the second disk's partitioning scheme with the first disk 5) recreating a new RAID-1 set starting from the first.

This is not for the faint of heart.