Amazon s3 Storage
Amazon has began offering a new service that many businesses are beginning to utilize. Unfortunately the brains behind the service didn’t put much thought into good documentation. I’m going to write some of the things that I have learned in using this service in hopes that someone out there won’t have to go through the mess that I did.
 Concept: The general concept behind their storage service is that you can host stuff on their servers for a price. Let’s say that you have a site that gets lots of traffic, and you store lots of images and videos. Since the amount of bandwidth caused by the images and videos strongly outweighs the simple amount of bandwidth from the visitors alone, if you move your images and videos to Amazon s3, then the bandwidth will be on their tab. This situation might not be feasible for everyone, but for others it is a great way to save a lot of bandwidth costs.
Overview: When you setup your storage, you create a “bucket”. Your bucket must be a unique name, and it’s used to identify your storage space. Once you have a bucket, then you can upload files to that bucket. You can then access those files anywhere on the net simply by a url like so: http://bucketname.s3.amazon.com/filename.txt
Sounds simple enough, right? Well, what if you want to control the access to your files because you don’t want them publicly available? There are a couple of solutions.
- Encrypt your files before you upload them to Amazon and decrypt them when you retrieve the files. If you encode it properly, then even though the files are public, no one else can decrypt them.
- Use Amazon’s built-in security access methods.
At first glance it might seem that the second option is the easiest route to go. HOLD ON! It actually turns out to be quite the ordeal to use their security access method. I will try to detail the security process as best as I can understand it. The security method can be handled in a couple of different ways (POST or GET), I’m going to focus on the GET method. This involves passing the necessary security information in the same url that you use to access the file and has the following format:
http://$bucket.s3.amazonaws.com/$filename?AWSAccessKeyId=$keyId&Expires=$expirationdate&Signature=$signature
That url is referencing the following variables:
- $bucket – the name of your unique bucket
- $filename – the name of the file you’re trying to access
- $keyId – the access key that you obtained when you signed up for the service
- $expirationdate – a date in the future when that link expires. This is kind of cool. It means that even if someone is able to intercept this link, they can’t access the file after the expiration date (Amazon sends back an XML error saying the link has expired).
- $signature – this is what gave me the most trouble and I will go into detail on next (you really need to pay attention to this)
A couple resources that were very instrumental in me breaking through was a PHP class designed to interact with Amazon s3 using REST and CURL (as apposed to Pear which usually don’t come with the standard installation of PHP), and also a tool to help debug your signature.
Amazon Developer Resource Center
Amazon s3 Detailed Documentation
The $signature variable is an encrypted string that helps Amazon determine your request a little more accurately. It contains the following pieces of information:
- Â request method (“GET”)
- expiration date (unix time stamp)
- bucket name
- file name
Each variable should be seperated with a new line “\n” character, and I have left out some variables that don’t need to be used. So, this is the signature string BEFORE it gets encrypted (with variables for bucket name and file name):
GET\n\n\n1325376000\n/$bucket/$filename
Once you have that string, then it needs to be encrypted using a 64bit sha1 algorythm. It must be encrypted using the secret key that you recieved when you signed up for the service.
So, that’s how you can get access to your files if they are secured by Amazon (and you’ve setup the security properly, which I’m not going to go into).
One question that I was asked is if you had 100 sites, how could you convert all audio and video over to Amazon s3 and have it be protected?
Well, the easy short answer is, it’s going to have to be done manually, it’s going to take a lot of man hours, and it’s going to be painful. The biggest problem lies with the “protected” issue. There are two solutions:
- Create the encrypted file URL by using a one-time script, and have the expiration date set to several years in the future. You can then use this link with a static html page and never have to update it again. Of course this does present somewhat of a problem because you expose your access key as well as the rest of the link which people can grab, so you’re no better off than if your files were publicly available to begin with.
- Each time that the webpage requests that file it has to dynamically create the encrypted URL. This requires that your server must have some server technology running on it like ASP, PHP, Java, etc..
So, there really is no easy answer. Although, if all of those websites simply had the audio and video stored on them, then people already had the ability to grab them, so there is no reason why they would need to be protected through Amazon.