Tuesday, March 11, 2008

Poor-Man's Time-Machine

Since my previous experiments with Apple's Time-Machine online backup solution had not turned out as expected, but I really like the idea of continuous online backup - specially since the new 10.5 Leopard release does not seem as stable any more as my old 10.2 Panther release.

The basic idea is to use rsync, which is an efficient and robust way to synchronize two file-system trees over the network. This article for example explains in much detail, how to use rsync and unix-file system hard-links to create multiple snapshots of a filesystem tree over time and only consume disk-space for the files which have change in the meantime. Minus the fancy GUI, sounds a lot like what Time-Machine is trying to do...

For my purpose a few monthly snapshots are more than good enough with the current one being kept reasonably well in sync - daily at least, as long as the laptop happens to be online long enough for the changes to be pushed over. Since the network is private, I am using a native rsync server directly instead of running over ssh, which should hopefully increase speed and reduce CPU consumption of the background backup task.

On the linux server, rotating the monthly snapshots from the current tree (using cp with hard-link option...) is as simple as adding the following script into the monthly cron queue:

/etc/cron.monthly/rotate-snapsots.sh:

/etc/cron.monthly/rotate-snapsots.sh:

#/bin/sh

cd /home/backup/powerbook
rm -rf snapshot.3
mv snapshot.2 snapshot.3
mv snapshot.1 snapshot.2
cp -al current snapshot.1


On the Mac side, the backup should only be triggered, if the laptop is connected to the home network. Unfortunately, there does not seem to be an easy way to trigger an action whenever the wireless interface connects to a network, we'll have to run a periodic job to check for it. Since the linux server advertises itself through bonjour, this could be done by detecting its presence - e.g. ping its local name - "tinylinux.local". Since this name is not very imaginative and somebody at work or on any wi-fi network I might connect could have a host with the same name, I use a check for the name of the wireless network instead to trigger the rsync backup:

#!/bin/bash

netname=<my>

sleep 30 # make sure wifi network is up and configured

# check if we are in home network
system_profiler SPAirPortDataType | grep $netname &amp;> /dev/null
if [ $? != 0 ]
then
exit 0
fi

echo "starting backup"
/sw/bin/rsync -vaHKL --numeric-ids --delete --progress \
--exclude="*/Cache/" --exclude="*/.Trash" --exclude=".Spotlight-*/" \
--exclude="*/Caches/" --exclude=".Trashes" --exclude="*.trindex" --exclude=".fseventsd" \
/Users/<my>/ tinylinux.local::backup &amp;> /tmp/backup.log


Launchd seems to be the recommended way to run periodic and background tasks on Mac Os now, so here is a user specific launchd config for the backup service, to run the above rsync-backup.sh script every 1800s (30min):

~/Library/LaunchAgens/rsync-backup.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-
1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>rsync-backup</string>
<key>Program</key>
<string>/Users/<my>/backup/rsync-backup.sh</string>
<key>StartInterval</key>
<integer>1800</integer>
</dict>
</plist>


It seems the StartInterval does unfortunately not take into account hibernation time, otherwise the job would start up most likely right away whenever the laptop is unsuspended after sleeping for more than 30min.

Even if rsync is interrupted in the middle of a synchronization, it is smart enough to pick up right away when started the next time. As long as the laptop is online for at least somewhat more than 30min on the home network, any changes made since the last time, should be synchronized properly to the current snapshot on the linux server.

Playing with Time-Machine

The latest version of Mac Os comes with automated backup system called time machine, which besides the cool GUI is basically taking periodic snapshots of all changes and saves them to an attached disk. Since my new 17' Powerbook is suspended most of the time and moving back and forth between home and office, any solution which assumes a static environment is going to be challenging.

I was hoping for a solution which would automatically back up any changes incrementally to my linux server at home, whenever the laptop finds itself on that network. This means, the system would have to auto discover its network environment and deal with interruptions, since I am not going to wait for any invisible backup job to complete before closing the laptop again.

Following these instructions , I created an AFP share from my linux server, including a bonjour zero-conf advertisement, which can easily be discovered in the network neighborhood and mounted as a share on the mac. Despite Apple's stated commitment for zero-conf plug-and-play wireless networking, there does not seem to be a way for a share to be automatically re-mounted whenever it becomes in reach. Funny enough, this seems to work only for AFP shares exported from Apple's own new Airport extreme base-stations, which can double as a network share based on an USB attached or built-in hard-disk. Unfortunately, nobody seems to have reverse engineered yet how that is done to replicate it on linux... Another interesting quirk by Apple is that Time-Machine does not work with any AFP shares other than those base-stations anyway - something which can be circumvented pretty easily.

So far, I can at least activate time machine on that network share to play with it, but given that it will time out when disconnected and not re-connect when back in range takes out most of the fun. In addition, Time-Machine seems to want to complete writing one of its snapshots and doesn't' re-try incrementally, which means it may never finish a single one if my laptop never stays online long enough. In addition, Time-Machine seems to have a tendency to fill out any available disk-space which is quite nasty on any shared disk unless it is given a dedicated disk.

All in all, I don't seem to be able to get Time-machine to do what I want - except maybe by spending another $300 for Apple's new Time-Capsule base-station with file-server, which might get closer to a usable solutions for mobile host like my laptop. On the other hand Time-Machine seems very rigid and not very thought out yet, but I like the basic concept so maybe it is time to build something myself...