Easing file recovery

By Paul Murphy, author of The Unix Guide to Defenestration

If you're a typical Solaris sysadmin you just live for those phone calls in which some user tries desperately to move his or her emergency file recovery to the absolute top of the data center's priority list. Well, maybe not - but there is something you can do to help both the user and yourself here.

Take a long hard look at non PC file recovery requests over a month or two and you'll probably notice that there's a pattern to them with 70 to 95% of the requests coming from 10% of the requestors. In most cases these cover the same systems too, most commonly a file that automatically gets deleted on completion of some process. Many companies, for example, have GL or other financial applications update processes that start with the daily transfer of a text file from another company or division. Since those files are usually erased a day or two after process completion, a user who wants to verify a result or rerun an update has to call you to ask for a file recovery.

It's easy to make this process hard enough on users so that they don't call you so often, but what they tend to do instead is copy the files to their PCs and cause absolute chaos when they update the system after having uploaded the wrong recovery files. The smarter thing to do, therefore, is to find ways of making recovery quick and easy.

My strategy for this is to set up a directory heirarchy reflecting the applications to which file transfer apply and then modify the delete part of the run time process to squirel away a copy of the file into the right spot in this heirarchy with its orginal name modified to add the date and time. A shipping information file, for example, might go into

/murphbk/orafin/inv/joe_s_trucking/pickups_ThuJan812:03:052004.gz

Once you've got this working and have developed a backlog of files, you can usually deal with user file recovery requests while the user is on the phone with you.

There are some housekeeping issues to watch out for:

  1. Don't stop making and testing the normal backups!

  2. Make sure your copies don't change permissions. The utility of 'umask 0000' differs by Unix variant so test to be sure but, generally, cp to a directory tree made with umask set to 0000 will preserve permissions.

  3. Although these are normally text files that compress very well, you'll still need to do cleanups. A daily cron job along the lines of 'find /murphbk -ctime 70 -exec rm {}\; ' will do nicely. Note that's just a bit over two months - because a week after month end is often a hot time for these requests.

  4. In some cases users will want to manipulate the file before uploading it to the application itself. For those, it's usually best to store the file both immediately after it arrives; i.e. as part of the transfer process; and, after the upload process.

You will always get outlier requests that you have to go to the tape vaults for, but automating the process of maintaining the on line backups and indexing that by application name generally makes the whole thing a lot less painful for you, and a lot more effective for the users.