Tuesday, March 11, 2008

Poor-Man's Time-Machine

Since my previous experiments with Apple's Time-Machine online backup solution had not turned out as expected, but I really like the idea of continuous online backup - specially since the new 10.5 Leopard release does not seem as stable any more as my old 10.2 Panther release.

The basic idea is to use rsync, which is an efficient and robust way to synchronize two file-system trees over the network. This article for example explains in much detail, how to use rsync and unix-file system hard-links to create multiple snapshots of a filesystem tree over time and only consume disk-space for the files which have change in the meantime. Minus the fancy GUI, sounds a lot like what Time-Machine is trying to do...

For my purpose a few monthly snapshots are more than good enough with the current one being kept reasonably well in sync - daily at least, as long as the laptop happens to be online long enough for the changes to be pushed over. Since the network is private, I am using a native rsync server directly instead of running over ssh, which should hopefully increase speed and reduce CPU consumption of the background backup task.

On the linux server, rotating the monthly snapshots from the current tree (using cp with hard-link option...) is as simple as adding the following script into the monthly cron queue:

/etc/cron.monthly/rotate-snapsots.sh:

/etc/cron.monthly/rotate-snapsots.sh:

#/bin/sh

cd /home/backup/powerbook
rm -rf snapshot.3
mv snapshot.2 snapshot.3
mv snapshot.1 snapshot.2
cp -al current snapshot.1


On the Mac side, the backup should only be triggered, if the laptop is connected to the home network. Unfortunately, there does not seem to be an easy way to trigger an action whenever the wireless interface connects to a network, we'll have to run a periodic job to check for it. Since the linux server advertises itself through bonjour, this could be done by detecting its presence - e.g. ping its local name - "tinylinux.local". Since this name is not very imaginative and somebody at work or on any wi-fi network I might connect could have a host with the same name, I use a check for the name of the wireless network instead to trigger the rsync backup:

#!/bin/bash

netname=<my>

sleep 30 # make sure wifi network is up and configured

# check if we are in home network
system_profiler SPAirPortDataType | grep $netname &amp;> /dev/null
if [ $? != 0 ]
then
exit 0
fi

echo "starting backup"
/sw/bin/rsync -vaHKL --numeric-ids --delete --progress \
--exclude="*/Cache/" --exclude="*/.Trash" --exclude=".Spotlight-*/" \
--exclude="*/Caches/" --exclude=".Trashes" --exclude="*.trindex" --exclude=".fseventsd" \
/Users/<my>/ tinylinux.local::backup &amp;> /tmp/backup.log


Launchd seems to be the recommended way to run periodic and background tasks on Mac Os now, so here is a user specific launchd config for the backup service, to run the above rsync-backup.sh script every 1800s (30min):

~/Library/LaunchAgens/rsync-backup.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-
1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>rsync-backup</string>
<key>Program</key>
<string>/Users/<my>/backup/rsync-backup.sh</string>
<key>StartInterval</key>
<integer>1800</integer>
</dict>
</plist>


It seems the StartInterval does unfortunately not take into account hibernation time, otherwise the job would start up most likely right away whenever the laptop is unsuspended after sleeping for more than 30min.

Even if rsync is interrupted in the middle of a synchronization, it is smart enough to pick up right away when started the next time. As long as the laptop is online for at least somewhat more than 30min on the home network, any changes made since the last time, should be synchronized properly to the current snapshot on the linux server.