Work
About
Contact

Back up an S3 bucket to another S3 bucket using s3s3mirror

June 24, 2013   |   Tech
Agile League

If you’ve got millions of objects sitting in an S3 bucket, backing them up to another bucket is a difficult task. The web console is woefully inadequate at that scale, and the handful of s3cmd or s3sync scripts floating around are horrendously slow. Luckily, someone finally decided to put together a good solution to this problem, s3s3mirror.
Here I will give you a quick rundown of how to use s3s3mirror to back up from one bucket to another using an EC2 node. It will take about 20 minutes to set up, and the running time depends on your bucket size. In my case, I copied 1.5 million objects in about 2 hours, significantly faster than any of the other solutions I’ve attempted.

No Warranty!

First off, a disclaimer: I offer no warranties that this works. Try this procedure at your own risk. If something goes wrong and your data is lost, you’re on your own. I didn’t write s3s3mirror, but I’m guessing the author also offers no warranties.

Restricted User

After looking through the s3s3mirror code, I couldn’t find anything sloppy or malicious, but in general it’s a good idea to be suspicious of these kinds of scripts. You’re probably backing up the bucket to protect against a script gone awry, so it would be tragic if the backup script itself went awry and destroyed your bucket.
One easy way to prevent this is to create an AWS user with specific read/write permissions. Specifically, it should only have write permissions on the exact bucket you wish to copy to. Everything else should be read-only.
Here’s an example permissions object that should give you the rights you need.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Get*",
        "s3:List*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::my-backup-bucket/*"
    }
  ]
}

Spin up the EC2 node

I used an m1.small node and it took about 2 hours to copy 1.5 million objects. At one point, top said machine load was over 40, so it really stresses a small node. Since s3s3mirror is highly threaded, my guess is that more CPU power translates to faster copying, but I could be wrong there.
For the following instructions to work, you will need an Ubuntu node. I used the AWS console wizard to spin up a 13.04 node, but I’m guessing this will work on many different versions of Ubuntu.

Install Maven and Java

s3s3mirror is written in Java and built with Maven. So, you’ll need to install those tools to build it.

sudo apt-get install maven
sudo apt-get install default-jdk

Note: I had some problems installing Maven, but Googling on the error message gave me a fix pretty quick.

Create your .s3cfg config

s3s3mirror actually has some tests as part of the build process, and these assume you will have some AWS settings at ~/.s3cfg Here is an example file:

[default]
access_key = ABCD1234
secret_key = 1234ABCD

Install and build s3s3mirror

Download, unzip, and build the s3s3mirror package

wget https://github.com/cobbzilla/s3s3mirror/archive/master.zip
unzip s3s3mirror-master.zip
cd s3s3mirror-master
mvn package

Run…and wait…

You should be all set to go at this point.

./s3s3mirror.sh my-original-bucket my-backup-bucket

A few parting tips

Double check permissions

You may want to double-double-check the permissions of your created user. A good way to do this is create another bucket with a single test item in it (call it test-bucket-a). Try using s3s3mirror to copy test-bucket-a to some-random-bucket. This should fail with permission denied errors. Then try copying to my-backup-bucket. This should succeed. Finally, try copying test-bucket-a to my-original-bucket and make sure that fails. At this point, you should be fairly confident that your user has read-only permissions on your original bucket and write permissions on the backup bucket.

Dry run it, just to be sure

s3s3mirror supports a dry run mode, and it’s worth watching the output just to be sure you’re 100% correct.

./s3s3mirror --verbose --dry-run my-original-bucket my-backup-bucket

Run in screen

Since this could take a looong time, I recommend running it inside screen in case your ssh connection is severed.