My S3 Backup System
Everyone's all excited about S3 from Amazon, and for good reason. It's a reasonably priced offsite storage mechanism.
I've been meaning to write some tools for myself to use this service, but it's been kind of lower on a really long priority list. I was pushed over the edge tonight by Jeremy Zawodny's excellent post on his analysis of building vs. outsourcing a storage server. Tonight I put some work into building tools to let me use my backup stuff with S3. Basically, there are two tools:
- A tool that reconstructs my backup directory tree (
date/db.schema.table/hardlinkseqfile
) into a zipfile per day - A tool that will synchronize a directory tree with S3
The first tool is really specific to what I'm doing, but the second tool can probably be applied to a lot of problems. Basically, invocation looks like this:
s3sync.py /directory/tree/to/sync/ s3bucketname $S3ID $S3PW
First, it asks Amazon for a list of stuff within that bucket, and then starts sending stuff up that it finds in the filesytem that are not in the bucket. If the directory contains a file named a/b/c/x.txt
then it'll be stored in the s3 bucket as a/b/c/x.txt
.
Once all files are synced up, it computes a set difference between what it knows to be there and what it found in the filesystem and removes everything that isn't in the local fs. Easy enough.
One thing it does not not do, however, is validate the contents of the file. For my use, files don't change, so if it's heard of it, it's got the right one.
Now I just have to wait a really long time to get all my data uploaded.