SEE-GRID Guide on WN replication

From EGEE-see WIki

Jump to: navigation, search

This guide describes a replication of an existing and properly configured WN to other machines that have the same hardware configuration. It is based on experiences gained by Milos Ivanovic, site administrator of Serbian AEGIS04-KG site.

The procedure relies on copying the entire hard disk (including MBR and partition table) using dd, gzip and ssh.

In cluster(grid) environments the majority of machines (especially working nodes) are completely identical (regarding their hardware), thus making the complete hard disk replication very useful tool, in case of disk corruption, operating system fail, addition of new computers, etc. This guide describes how it can be done if you want to replicate the entire disk of any working WN and copy its image to a dead WN.


1. Execute the following pipe on working and properly configured WN (source WN):

dd if=/dev/hda bs=1k conv=sync,noerror | gzip -c | ssh -c blowfish user@hostname "dd of=hda.gz bs=1k"

The output file can be written to any disk having enough free space (perhaps your SE). Of course, the above process can be repeated for each hard disk device you may have on source WN.

conv=sync,noerror tells dd that if it can't read a block due to a read error, then it should at least write something to its output of the correct length.

bs=1k sets the block size to be 1k. It needs to be no larger the the block size for the disk, otherwise a bad block may mask the contents of a good one. 1k is a safe bet.

In the above example the output of dd is piped through gzip to compress it. We then pipe the compressed data stream over an ssh connection to another linux machine. If you wanted to write straight to a local file, you could either just add of=hda.raw to the first dd command (to write an uncompressed image), or if you want to compress it, just redirect the output of the gzip to a filename.

The -c blowfish option to ssh selects blowfish encryption which is much faster (useful since we're sending tons of data) than the default. Finally another dd command is invoked on the remote machine to read the data stream and write it to a file there. Alternatively you could pipe it through gunzip -c and write it straight to a partition on the remote machine instead of to a file.


2. In order to restore the hard disk, prepare your dead WN by booting some live distribution, because we need linux with openssh server running on it, but DO NOT mount any hard disk partition. The command to be executed on the machine with our HD image is:

dd if=hda.gz | ssh -c blowfish root@deadhost "gunzip -c | dd of=/dev/hda bs=1k"

This process takes a bit more time than making the HD image, because now dd has to fill the entire disk (including empty blocks). Of course, the above should be repeated for all hard disk devices you may have on target WN that are to be identical to already created images from the source WN.


3. If your system is ScientificLinux or any RedHat derivate, only two things you have to do after hard disk replication:

(a) reconfigure your network card and

(b) change the host name in /etc/sysconfig/network.


4. Reboot the machine and enjoy your new WN!

Personal tools