Backing up all your files

If you’ve followed along with the recommendations of earlier articles in this thread, you will have identified “sets” containing your data files. Grouping them together makes it easier to manage the files on your system, and by extension makes it easier to manage your backups.
I’ll get back to sets further below, but first a look at backups in general (hopefully you’ve already read the start of this series of posts).

“Imaging” your hard drive

Stonington Sunset, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_010459)
The simplest form of backup is to get a second drive big enough to store all the data on your main drive, and make a copy. Software such as SuperDuper! and Carbon Copy Cloner can make bootable copies of OS X drives, even onto drives of different sizes.
Microsoft Windows backups are a bit more complex, and most users have never managed to do it. Acronis True Image and Symantec Ghost purport to be able to make backups of system drives. My own experience with Ghost on Windows XP in 2008 resulted in complete failure, but I hear good things about Acronis’ product.
Even if there’s no special Windows software available, it’s possible to use a Linux or BSD bootable CD (e.g. Knoppix) and use that operating system to make a block-level copy of any drive on your system.
Of course any operation like this requires some “down time” to have the system idle (or shut down) while the copy is being made. The hassle involved in making a backup this way usually means that they happen infrequently, so it’s not a perfect solution. But if your boot drive fails it can be nice to be able to boot off the backup drive and know you have a complete working system.
Time Machine

If you’re using OS X 10.5 Leopard you may already be using Time Machine to keep your files backed up. Time Machine is a nice implementation of backups for most “normal” files: it’s usually configured to backup to a dedicated USB or Firewire disk directly attached to the computer. Every hour it makes copies of any new/modified files, and preserves “snapshots” of the system for each backup. It keeps the last day’s worth of hourly backups, the first backup of each day for the last month, and as many weekly backups as will fit on the backup disk. Note that it expects to be able to fill up the backup volume. You can put other files onto the volume, but any available space will eventually be used up by Time Machine.
If you use a GUID partition table and copy the contents of the OS X install DVD onto the backup partition (using Disk Utility‘s Restore function) the disk can be used as a standalone boot/restore disk to completely reinstall your machine to the state of the last TM backup. I have used this successfully on several machines to recover from a system drive crash (even onto a complete replacement machine – not so easy to do with a Microsoft operating system).
As well as complete restores, Time Machine has a flashy interface that lets you explore back through time (thus the name) and recover specific old versions of any file or folder it backed up.
However, Time Machine doesn’t cope well with all files. When it decides that any part of a file has been updated, it simply makes a new copy of the entire file. If the file is a large Lightroom .lrcat (my largest is over 1.2GB) or a virtual disk image for VMware or Parallels (my Parallels disk image is 30GB) TM would spend a long time copying the file. Not only would a new copy of the file every hour quickly fill up your backup disk (causing the automatic deletion of older snapshots) but it’s likely that the file was still being modified while it was being copied (resulting in a useless corrupted backup).

You can turn Time Machine backups off via System Preferences while using those large files (of course, it’s easy to forget to turn the backups on again!) or you can configure Time Machine to always ignore specific folders. If you’ve consolidated your catalogs into one folder (see the earlier introduction of “sets”) they make perfect folders to get Time Machine to ignore. Of course you need to be sure you’re making backups of those files some other way! Note that by default Time Machine only backs up internal drives (not removable USB/Firewire drives).
Time Machine does a good job of making backups, but it only backs up to one drive. Not only does that drive need to be larger than the data on your internal drives (especially if you want to have some history available), but if that drive fails you will lose your protection (as well as any history of file changes). The way I cope with this risk is to make weekly DMG images of the Time Machine disk onto one of my backup drives (using Disk Utility). The DMG can’t easily be used to restore from directly, but after a disaster I can use another machine to extract the DMG contents onto another drive.
Incidentally, the internals of Time Machine may get an overhaul in OS X 10.6 with the introduction of the ZFS filesystem. I guess we’ll find out soon.
Lightroom backups

If you’re using Adobe Lightroom you will have seen the regular request to “backup” your catalog. Hopefully you’ve realised that this just makes a copy of the catalog file (after doing some integrity checks of course) and doesn’t actually do anything about backing up your image files. The default place it puts the backup files is a sub-folder of the catalog folder, but it’s easy to set it to use a different physical disk if that suits your configuration.
Backing up the catalog is important, even if you have it set to write metadata out as XMP to the image files. As well as the image metadata and CameraRaw edits, the catalog contains lots of data that doesn’t get saved to XMP: virtual copies, collections, flags, stacks, develop history, and any plugin-specific metadata. That’s typically data you don’t want to throw away.

Lounging Leopard, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_015729)
Lightroom also keeps a Previews.lrdata folder/package for each catalog, containing the preview images and thumbnails. This speeds up Lightroom as well as allowing you to see the images even if the drive containing the actual image files is not mounted at the time. This Previews database is not critical to backup, as the previews will be regenerated as required (as long as you have the catalog and the image files).
When Lightroom backs up a catalog it doesn’t bother making a backup copy of the Previews database: it can be huge (at last glance the Preview database for my main catalog is 45 GB!) it’s made up of lots of small files (so a copy would be very slow) and it’s rebuildable.
Mind you, sometimes you may wish to make a backup of the Previews database: that 45 GB database could take days to regenerate fully.
“Manual” backups
So, you’ve got Time Machine (or some other software) backing up your boot drive and all your normal files including applications, email, etc. But you’ve told it to ignore the set containing your Lightroom catalogs. And possibly sets containing your photo files. For small collections you might let Time Machine take care of your RAW files (they tend not to change, unless they’re in DNG format and get the XMP re-written) but as your collection grows you’ll probably find that Time Machine’s single backup disk doesn’t suit any more.
How can you make backups of these “sets”? The simplest (to comprehend) is just to make a copy via Finder/Explorer onto external drives which you can then disconnect from the system. If you need to access the copied files then you can just open them as normal files on whatever computer you connect the backup drive to. There are ways we can optimise the copy function, but before that it will help to introduce some more terminology:
So far where we’ve talked about a “set” of files, it’s really just the “primary member” of that set. Any backup copies of the primary can be referred to as a “secondary” member of the set.
If you have the primaries of an Images1 set and a Catalogs set on your computer’s internal hard drive, and the primary of an Images2 set on an external drive, you might decide to store secondary copies of all three sets on an external 1TB or 2TB drive. Or you could split the secondaries across multiple drives.

If you can set up a backup system where updating these secondary copies is quick and easy, it’s easily extended so you can maintain multiple secondary copies. In my own system I have three secondaries for each set: one on a permanently-connected drive and updated daily, and two on removable drives (one updated weekly and stored on-site, and swapped monthly with a group of drives stored off-site).
This whole concept is fundamentally “low-tech”. We make a copy of the files onto another disk. We don’t automatically have a history of versions available (other than having extra secondaries that haven’t been updated for a while). But it works, and can be used on systems ranging from a few megabytes to many terabytes.
File Synchronisation
Where the magic starts to come in is in automating the update process, so it’s time to talk about file synchronisation. Rather than simply dragging/dropping a folder from one disk to another (and thus re-copying all the files) it’s much easier if we can run a synchronisation program to only copy new or updated files (and possibly delete removed files) without wasting time on the unchanged files. Luckily these programs are available.
Microsoft offers the free SyncToy to do file synchronisation. 2BrightSparks offers the free SyncBack. Commercial offerings for OS X include ChronoSync. Some of these allow you to set up presets of a source/destination folder pair, although extending this to groups of secondaries that might not all be connected at once is sometimes not straightforward. Built in to the base OS X is rsync (also available for other Unix-style OSes and for Windows) although to operate it you really need to set up command-line scripts.
My own PteroFile program for OS X uses rsync to do the underlying file synchronisation, and builds on top of it with a configuration database and many photographer-specific workflow opimisations (including Expression Media and Lightroom plug-ins). PteroFile has been in heavy use by a small number of testers since mid-2008, and we’re going to make it available to a wider audience of Mac users in June 2009. In the current version the user interface is a bit crude (needing the use of the commandline to configure it, but with several GUI interfaces for those people that prefer those for things like initiating synchronisation).
I won’t go on about PteroFile in this post, but you’ll hear more about it soon!

Shooting the Midnight Sun, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_010513)
My incentive for writing this series of articles was to introduce the underlying issues of backups and the concept of sets and members as a prelude to the release of PteroFile, but I’m sure the concepts will also be useful to many other people who might not be able to use PteroFile itself.
There’s still lots more to talk about, including issues of backup media (hard disks, DVDs, “cloud storage”, etc) backup verification (both that it can be restored and that the restored images are “correct”) and more. So keep checking back here.
— David

One Comment:

  1. Great precise info, I’ve been searching on this topic for a while. Bookmarked and recommended!
    Rosetta Stone Spanish Latin America

Leave a Reply