Protecting Your Digital Research Data and Documents

One of the most frequent questions I am asked by student and veteran researchers alike is “How do I protect my digital data and documents?”  When I dig a little I find that most people are doing some kind of irregular backup of their computer files to flash memory drives or USB hard drives.  They have an intuitive sense about the value of making backup copies.  What they are looking for from me is the assurance that they are doing an adequate job of protecting their valuable research products.  With this reassurance they can breathe easier and concentrate on the real job of doing science.   In this briefing I will outline for you my Rule of Three  for protecting your data. I will not discuss long-term archival or tell you how to backup and restore your computer’s operating system after a disk crash. Instead, I will focus on the daily protection of your data and research works in progress.

 Disk drives are cheap, but the cost of replacing one can be enormous if sudden failure forever separates you from your data and documents. The average failure rate of a normal computer hard disk is hard to come by and depends on a number of variables, but in general for disk drives 2 yrs old or older, the failure rate is somewhere between 7 and 10%. That is, 1 out of every 10-14 disks will fail within the year (flash drive failure rates are even higher).  These probabilities are not to be trifled with. There are also the other inherent dangers of loss, theft, or damage to computers or storage devices.  

The best analogy that comes to my mind for the heart-gripping anxiety that occurs at the thought of losing data is that of losing light deep in a cave.  I have over the course of my life spent a lot of time in caves.  To avoid anxiety and achieve peace of mind, cavers find it helpful, necessary, and required to carry at least 3 sources of light.  These have to be reliable light sources suitable to the task,  a high-quality primary with at least two backups. Many times upon some failure of bulb, battery, case, or wiring I have had to go to the first backup but I have yet to have to go to the second.  The Rule of Three provides the breathing room I need to function.

Let’s apply this Rule of Three to backing up your research data.  What you ultimately want is to have your data stored in three separate physical places.   I call this Brunt’s Axiom of  Here, Near, and There - or more lyrically in Spanish - aca’, alli’, y alla’.   I will define the Here, Near, and There tactics that will cover several common work strategies.

1. Here - Here is wherever you are generating your content- be it a laptop or desktop computer, portable hard disk, or the Web.  This is  the source for your digital information, where your work gets done. For most  researchers Here is a single  computer;  for some small but growing number Here is web-based applications like Google Docs. The latter individuals carry their research lives around with them on a USB hard drive and use a variety of available computers.  Wherever Here is for you, it’s the place you store and manipulate your data and documents and, like my primary cave light, it should be reliable and secure.  

Organize your files Here so that they fit neatly in one folder. The design after that can be hierarchical or flat to suit your style but if they are in one folder it makes protecting your data simpler.  If you live in the Microsoft PC world this could be your “My Documents” folder, or it could be any folder anywhere on your computer. Mine is on my desktop, and it’s called simply “work”.  I’m not going to moralize in this article about file naming strategies - find something that works for you and gives you comfort.

2. Near - If you have access to an institutional file server and procedures to store data and documents  - use them. You are doubling or perhaps tripling your protection depending on local practice.  If not, this is where a nice fat USB hard drive comes in handy.  These range in capacity up to terabytes and prices are dropping all the time.  In the Near is where you have to make some decisions.  You can use the operating system backup tools to write regular backups to your USB or Network drives - MAC OSX has a particularly nice one.  WINSCP is a particularly powerful tool for MS Windows users that can synchronize between drives. Or, you can do manual, drag and drop, backups and store your files in such a way to make them easy to recover.  In general, do not make a flash drive your Near data store. Grade A and B flash drives can be safely used for transport and for extra copies if you are not abusing them too much, e.g., carrying them around in a backpack that sometimes sits in a hot car.  I want my Near data copies close, like my secondary cave light,  and easily accessible ‘when’ I make mistakes.

3. There -  this is the part that makes living in 2010 fun.  In the not too distant past, having copies of data There meant sending boxes of disks or tapes to your mother.  Software companies now offer online storage solutions that are cheap, fast, reliable, and intuitive.  Some examples are Sugarsync, Mozy, DropBox, Syncplicity, Fasmule, and Opendrive, to name a few.  My personal favorite for ease of use and multi-platform support is DropBox.  DropBox is still in beta but then so are many Google applications.  These services basically all work the same - they provide you with a certain amount of disk space out There somewhere in the ‘cloud’ (In the case of Dropbox this storage lives on the Amazon S3) and give you access to it through the web and through a variety of desktop applications.   This is where these services begin to sort themselves out for you by offering slightly different features and by supporting different platforms.  Dropbox supports web access as well as access to online storage for MAC, PC, Linux, and Smartphone platforms, and it utilizes the folder structure on your platform, so there’s no application to open. Dropbox automatically copies any files put into the dropbox folder to the online storage.  I keep my work folder inside the Dropbox folder. If I create a Dropbox folder on another computer my files are synced there too.  Unlike my third cave light,  the convenience of having my third data copy online encourages it’s use.

Now you’ve solved your data security problem, having three or more  copies of your data creates a data management problem.  How do you keep all these copies in sync?  In the past, this was my biggest headache and resulted in some elaborate programming solutions. Today, technology has caught up with demand, and keeping files in sync is getting easier all the time.  When I say synchronized,  I don’t just mean “copied“ but “copied with a record of changes.”  Most of the online services mentioned above take care of versioning files for you. If I copy a file to my USB drive to replace one that already exists,  the old version is overwritten. If I copy a file to my Dropbox, the program replaces the file while saving the old version.

Alternate Lifestyles

If you live in Google Docs, that’s your “Here” not your “There” so you have to export your files somewhere to a local drive and make copies.  Syncplicity has a Google Docs syncing facility in addition to Windows and MAC support that make it a more desirable service for this work scenario.

If you are going to be incommunicado from the Internet while collecting data all summer in Patagonia, The Brooks Range, or New York City, then you’ll need to adopt a different approach to the “There” solution.  Keep two “Near” copies, and launch an occasional  copy back to your home or institution whenever an opportunity presents itself. You can use high-quality USB Flash Drives for this purpose.

Here, Near, and There - Aca’, Alli’, y Alla’:  If you have three, you can breathe easier in the dark.

Copyright 2010 James W Brunt