A Backup Method for Linux Using Mirrored USB Drives

With the inbuilt software mirroring within Linux and the inherent hot-plugability of USB drives, I decided it was time to look at a high-capacity backup system using a combination of the two.

Mirroring is hardly new, and the method I describe is based on work I did long before Linux was born. What is new is the ready availability of the components required at a cost that makes it a reasonable option for the SOHO (Small Office/ Home Office) market.

I'm using a relatively cheap server that came with 4 USB ports, but even for those with only two ports, you can get a USB hub not much more than $10. The drives used are from LaCie, who make a variety of USB connected drives up to 2TB. Well, I've been using a LaCie Brick for the main backup server for a while now, and the tests were done with ATA drives, but as soon as I can get the shekels together the main backup is in for a facelift.

To sumarise, the steps are:

1. Assemble the components

2. Build the RAID device under Linux

3. Build a filesystem over the RAID device

4. Mount the filesystem

Pretty simple really. So lets start.

1. Assembling

You'll need 3 drives of identical size for this exercise. Plug them in and then check your /dev/ directory for sdx entries (e.g sdb, sdc, etc). Plug in one at a time and note which drive is listed as which node. Put a label on each one so you know when it comes to unmounting the drive. In my case the primary drive is /dev/sda and plugged in drives start numbering from there. So for this instance we have sdb, sdc, and sdd.

2. Build the RAID device

It isn't all that clear initially but you don't really have to prepare the disks for RAID usage as we intend to use the entire drive, hence we don't care about the partition table. If we want to limit to partitions, then we need to do some preliminary work (set the partition type to fd - Linux RAID autodetect). So all we need to do is build the mirror.

mdadm /dev/md/d0 -n3 -l mirror /dev/sdb /dev/sdc /dev/sdd

OK, the above basically says to create a mirror called /dev/md/d0 with 3 parts and names the devices to add into the mirror.

3. Build Filesystem

For this I went with using LVM so I could play with partitioning, but it doesn't matter much. You can create a filesystem on /dev/md/d0p1 if you want to bypass this. Anyway, back to LVM.

pvcreate /dev/md/d0

This creates the Physical Volume, which is the basis for all LVM volumes. Next we need to create a Volume Group:

vgcreate MirrorVolume /dev/md/d0

You can call the Volume Group whatever you like, and you may want to stick with whatever convention is already in use by your system. E.g in Fedora the volume groups are called VolGroup00, VolGroup01, etc.

Now we need to create the Logical Volume, and for this we need to specify a size (the above two we could have but we defaulted to all available).

lvcreate -n LV1 -L 460G MirrorVolume

In this case I'm using a 500G drive, but we all know that drive manufacturers report gigabytes as 1,000,000,000 bytes and not as the rest of the computing world which knows it as 1024*1024*1024 or 1,073,741,824 bytes, leading to a 7 or so percent difference in available size. You will be pretty safe taking the stated value and multiplying by 0.92.

OK, after all that we still don't have a file system! Lets get that part done:

mke2fs -j /dev/MirrorVolume/LV1

Note that the device nodes are automagically generated for us.

4. Mount the Filesystem

Having a filesystem doesn't help much if the rest of the system doesn't know it exists, so we need to add it and a mount point. First, create a directory somewhere where you want this to be added. I'll create /backup for that:

mkdir /backup

Now we could just mount it now, but when we reboot the system won't remember that, so we put it in the system mount table. For this we add a line to /etc/fstab like:

/dev/MirrorVolume/LV1 /backup ext3 defaults 0 2

To decipher, the first entry is the device (in our case the newly created file system in our logical volume), the next is where to mount it, the third is the file system type (ext3 is pretty much the standard file system type on Linux, although we could easily have created any other supported type). Next comes the mount options, and we really don't need anything other than the defaults. Finally we have the dump pass and the fsck pass. The first identifies which dump cycle will be used for backup. Since we are in fact using this for backup, it makes no sense to include it in another backup. The second one (the 2) tells the system to check the file on startup, but in phase 2, i.e. after the root file system is mounted.

There is now one other thing we need to do before we can reboot, and that is to let the system know that the raid devices exist, so they can be properly mapped at boot time.

mdadm --examine --verbose --scan --config=/proc/partitions >> /etc/mdadm.conf

If you had a previous mdadm.conf file you may want to remove any duplicate entries, if not this should get you out of trouble. You can also remove any options from the lines generated and leave only the device name and the UUID:


ARRAY /dev/md/d0 UUID=c44c4585:f8b3e0ba:bc171fb8:9618c603

OK, now we can manually mount, or reboot and see it all happen magically. But since this is Linux, we never reboot unless we need to, so:

mount /backup

Yep, because we have a fstab entry we don't need anything else other than the mount directory or the file system name.

Managing the mirrors

Now we have the mirrors set up, how to manage them? The entire idea is to have a reliable backup with the ability to take a drive offsite. Well, now everything is set up, lets drop a drive out so we can take it away with us.

mdadm /dev/md/d0 --fail /dev/sdd

This marks a drive as failed, which we need to do before we can remove it. Now the documentation says you can do the remove in the same command, but for me it appears there is a time lag between the two and you have to run the remove separately, so wait a few seconds and then:

mdadm /dev/md/d0 --remove /dev/sdd

Voila! Drive with all of your data (well, if you had data on it) backed up and able to be powered down and removed. Now comes the tricky bit, deciding on which drive is which. You did mark them as they were plugged in, didn't you? Power down the drive and unplug it.

Remember that when a drive comes back online it may be a different /dev/sdX entry, but this doesn't matter, as mdadm queries them based on a unique id it wrote to each drive when it created the mirror.

So, you want to swap some drive around? Maybe it has been a week and you want to take a current snapshot offsite and replace with the drive that you took off last week. The way to do it is first to plug in the drive and power it up, check what node it comes up as and add it to the mirror again:

mdadm /dev/md/d0 --add /dev/sdd

Now you should check the mirror status:

mdadm -D /dev/md/d0

It should show that there are 3 Working Devices, 2 Active and 1 Spare, and the state is likely to be 'clean, degraded, recovering'.

OK, now you can remove one of the other drives. Here you have to be a little careful as you can cause yourself a lot of grief if you get it wrong.

If you have rebooted in the meantime you may find that the drives have taken different nodes, depending on which one the system saw first. If you mark a drive as failed and remove the wrong one your mirror will be broken, and that is 7 years bad luck. A simple way to find out which is which is to do a little disk intensive work while watching the drive activity lights:

dd if=/dev/sdb of=/dev/null count=100000

That will give somewhere between 5 and 10 seconds of heavy read activity, so you should be able to identify the drive. Now just repeat the process as before to mark the drive as failed and remove it from the mirror before powering it down and removing it.

Backup regime

You really, really need to look at rsnapshot for backups. I use a daily, weekly and monthly cycle with coverage for 6 months rolling backup.

This doesn't take a lot of space (space only increases by the size of changed files between backups) but provides a convenient format for recovery (just copy the directory back).

Conclusions

Using USB drives (or other hot-plug devices) and Linux software mirroring can give you a decided edge in managing your data.

4 comments

Comment from: Ty Tower [Visitor]  
Ty Tower

My Mandriva 2007 does not have “mdadm” so what is it and where do I get it?
Is there an alternative command

23/10/07 @ 10:27
Comment from: [Member]
aj

mdadm is a standard Linux utility and from the man page:

The latest version of mdadm should always be available from

http://www.kernel.org/pub/linux/utils/raid/mdadm/

Please note that I haven’t had a chance to get back to this (see my later posts), so I haven’t overcome the reboot issue yet.

23/10/07 @ 10:30
Comment from: Ty Tower [Visitor]  
Ty Tower

Man - so I get mdadm and I use –build because you obviously refer to a much older version of mdadm . Get that dome then try
“pvcreate /dev/md/d0″ and it looks at me stupidly!

have to get this too I suspect , how long will this go on ?
Nah I’ll try something else

23/10/07 @ 11:01
Comment from: [Member]
aj

I suspect you need to add some missing packages on your system, I’m not sure of Mandriva, but in Fedora the packages you need are mdadm, mdraid and lvm2 as a start.

For pv*, vg* and lv* commands you may need to prefix them with lvm, e.g.:

lvm pvcreate …

23/10/07 @ 11:09


Form is loading...