Guidelines for image storage

Today when the master copies of our images are usually digital (even slide workers often put a lot of work into digitising their transparencies) it’s easier to store and organise thousands of files than it was to do the same with physical prints/negatives/transparencies. But it’s also easier to lose everything when you make a mistake (or a disk crashes, or any other disaster happens).
Many of my students (and I’m sure many other professional photographers) start off struggling with achieving reliable storage of their photographs. But just a bit of organisation (which is also needed when your image collection starts growing to many thousands of images) will make things a lot easier. Since the late 1980’s I’ve been involved in designing and operating large computer systems. Although it’s been a few years since I did that for a living (I’ve been working as a photographer since 2002) the things I learnt in that transfer very well to the world of digital photography. The gigabytes and terabytes of data involved in today’s photography aren’t really much bigger than the datasets we dealt with “back in the day”.
I will write more on this in future posts, but wanted to start with this list of points to consider. Some of this is about data backups in general, although some of it is specific to organising photo files.

Backups are essential. Ok this sounds obvious, but too many people just hope that disaster won’t happen to them. Eventually it will: all disk drives will fail at some point. All of them.
How paranoid do I have to be with my backups? That’s up to you: it’s all about mitigating risk. How much is it going to hurt you if you lose your files? A useful equation to keep in mind is:
Risk = probability of failure * cost of failure
If it’s only going to take you 5 minutes to re-create some files, then it’s only going to hurt you if they get lost frequently. If the files are the result of expensive trips, or opportunities that won’t return (photos of an alien landing, or even just of your family) the risk could be regarded as high even if there’s a very slight probability of failure.
A good backup system does not have to be incredibly complex to be effective, but it’s worth applying a little thought to the risks that your data is exposed to. If you make backup copies to another folder on the source drive, that’s not providing a lot of protection. If you make backup copies to another drive that’s better, but if it’s in the same location and always connected to the computer then it’s still at the same risk of corruption, fire, theft, lightning strike, etc. When establishing your own backup system, you’ll make your own decisions about how much is enough.

EOS 5DmkII, 180mm macro (A2_020109)
RAID is not backup. It’s just increasing the reliability of the storage device. It’s still just one copy of the data. If you delete or corrupt a file, that deletion or corruption is stored reliably.
Backups must include more than just your data files. To use your files once they’re restored you’ll probably have to use specific software (e.g. Photoshop, Lightroom). So you need to make sure you’ll have that backed up and restorable also.
Backups that are in “normal” format are usually best. For example a copy of the files onto another drive (be it optical, hard drive, or even tape media). If you need special restore software to recover files, you need to be sure have that backed up too (and have a computer capable of running it: which can be a problem 5 years down the track)!
More than one stage of backup can be good. If your backup operation updates your data to multiple places at once, that can be as bad as RAID at “backing-up” corrupted data. Better systems provide access to “yesterday”‘s data separately from “last week”‘s data.
Backups need to be operated simply and regularly. Automatic backups can be good, but a manual process can work well: as long as you don’t have to think too much about what to do.
Fancy schemes such as those that involve rotating backups manually usually have lots of opportunity for errors, from “I forgot” through to “I didn’t think of that“.
No backup media will last forever. Whether it’s CD-R, DVD-R, magnetic tape, or hard drive, it won’t last forever. Some media will fail over time (optical media are examples of this, especially if not stored carefully) whereas some will become obsolete (can you read those Jazz drives you backed up to years ago?). Any good backup system will evolve and will allow you to migrate backup data to new media. Whether by copying files from 7 CD-Rs to a new DVD-R, from 10 DVD-Rs to Blu-Ray, from old hard drive to new hard drive, etc.
Restoration of backups needs to be tested regularly. There’s no point making backups regularly if you only find out that they failed when you’re trying to recover after a disaster.

EOS 40D, 24-105mm (A2_020132)
Give your drives sensible and unique names. OS X drives are identified by the volume name (unlike in Windows where the drive letter can change when the drive is connected). At least in Windows each drive can have a volume name although it’s not so central to accessing the data on the drive.
Make top-level folders on drives to store any data. Don’t just scatter data across the “root” folders of the drives. For example you might have a folder on one drive called “Images-A” containing photo files (with appropriate sub-folders of course). If you need to start using a second drive, you could give it a folder called “Images-B”. This will simplify later rearrangements when you move data from one drive to another (e.g. consolidating data from multiple smaller drives to fewer larger drives).
Backup drives should have at least as much space as the drives being backed up. That is, be big enough to hold as much as the source can (not just how much data is there now). Otherwise you will find yourself eventually putting data onto your system that isn’t fitting onto the backup, and you won’t necessarily realise immediately that your system is failing to protect your data.
Sometimes this means that if you’ve bought a big new drive, that has to be used as a backup drive. Sometimes it means that you need to buy drives in pairs (at least).
You should always have at least 3 copies of your data. Ideally at least one of them should be offline (disconnected, where it can’t be affected by accidental deletes/formats/etc) at any one time. Sometimes 2 copies is enough, but you are still at much higher risk than when you have 3 copies.
This includes when copying from your camera’s flash cards. Ideally you should only format the cards when you know the data’s been copied to at least 2 drives. Sometimes that’s awkward, but if you only have your data on one drive you should be aware that you are tempting Professor Murphy.

EOS 5DmkII, 180mm macro (A2_020030)
Files in your storage system should all have unique names.
For example you should be able to later reorganise your files and never have to worry about overwriting a different file that happens to have a duplicate name.
The best time to rename files (to achieve the above uniqueness) is as they’re added to your system (e.g. copied from flash cards). Keeping files with names such as DSC_1234.NEF is usually not sensible. A range of options are available, including combining the camera filename with the date/time of the photo. Most photo management software includes functions to automate this.
Some people put their name at the beginning of the filenames so they can give files to clients and have it obvious where the file came from. Some people only give out files that have been exported with simplified names (keeping details like the source ID in the image metadata).
Long filenames don’t matter (within limits). When was the last time you used something other than pointing and clicking to open a photo file? Having unique filenames does matter.
Files in your storage system should not be renamed without good reason. Consider having a source file called 200901201245_1234.NEF (depending on your naming scheme, this might be a photo taken at 12:45PM on the 20th of January 2009 and named DSC_1234.NEF by the camera) and you’ve generated the file 200901201245_1234-edited.psd from it. If you’re working on a PSD file in the future and decide you want to find the matching RAW image, if the PSD file was named TwoBirdsOnBeach.psd you wouldn’t know what to search for. By keeping the filenames the same you should be able to find the file even if it’s on an old backup copy.
Photo filenames within your storage system should not contain details of subject matter. The filenames just need to be unique. Using software such as Aperture, Lightroom, Expression Media, and even Bridge you can easily using metadata fields such as keywords to quickly find relevant images even when your image collection grows to hundreds of thousands of images. You can of course use folders to impose some subject-matter structure on your files if you want (as long as you know the filenames will be unique).

EOS 10D, 28-135mm (A1_2F12)
That’s more than enough for now. Give all that some thought, and we will soon discuss some tools to help implement data backups for photographers.
— David

One Comment:

Leave a Reply