Amazon S3 Pitfalls: How to innocuously rack up a $1,797 bill restoring 1TB of data from Amazon S3 Glacier
The title sounds preposterous, as if this couldn’t possibly happen to any well-meaning user just toying around with a pet-project (albeit a data intensive one :P).
However a close reading of AWS’s terms, coupled with some horror stories around the internet, reveal that it’s (IMHO) ridiculously easy to end up with a bill in the several thousand USD range, and if you are not careful, it may well happen to you. Here is how:
Suppose you have 1TB of data archived to Glacier, and using their API, decide to restore it all at once to regular S3 storage.
You may do this because you are experimenting with your startup idea, want to do some benchmarking, and a quick read of their pricing model says that restore pricing “starts at $.01 / GB.” Heh, $0.01 / GB * 1TB = $10, right? So how bad could the final bill be? $20, $30, $100?? Wrong, wrong, and wrong!
Congrats “mr / mrs / ms move fast and break things” :P, you’ve just incurred a $1,797 (USD) bill.
If you read their terms and examples carefully (notably, all of which are construed to result in small charges in the $10-$20 range, somewhat deceptive IMHO) and apply it to this specific situation (ie: by following their “Learn more” link), the following happens:
In any given day you are allowed to restore a maximum of 1/30th of 5% of your total Glacier storage (ie: 5% per month but prorated daily). If you exceed that limit in ANY day, Amazon will calculate a charge based on the following formula:
Your Bill = (PeakHourlyRate – MaxFreeHourlyRate) * 720 hours * $0.01 / GB
The 720 hours times the PeakHourlyRate is the gotcha. Basically, even though your restore is assumed to take 4 hours, they will charge you as if you were running a sequence of restores at the PeakHourlyRate for all 720 hours that exist in an average month.
For 1TB, your PeakHourlyRate will be:
PeekHourlyRate = 1TB / 4 hours = 250GB / hour.
(Amazon will assume that the restore operation is completed in a max of 4 hours, regardless of whether this is so).
Your MaxFreeHourlyRate will be calculated as:
MaxFreeHourlyRate = .05 / 30 * 1TB / 4 hours = 0.417 GB.
Eh, that’s not gonna help you much here. Thus your bill will be:
YourBill = $(250.00 – 0.417) * 720 * .01 = $1797 (rounded to the nearest dollar).
Moral of the story, Glacier is not for those who do not have a thorough and exact understanding of what they are doing and why. Especially so if there is a more than remote possibility of initiating a restore above their 5% / 30 daily quota.
Below are a few more observations / thoughts on Glacier.
Restoring your data from Glacier for free will take about 2 years
Well you only get to restore 5% per month (pro-rated daily) for free. So assuming you were to restore your data at a rate of 5% per month, this will take:
100 / 5 = 20 months
which is just shy of 2 years.
This assumes you don’t delete data from Glacier as you restore it, because that will simultaneously decrease your free monthly quota on a GB basis, leading to an exponential-decay cycle which may technically take for ever to terminate (the amount of data in Glacier will become vanishingly small, but so will your quota. Though technically after 20 iterations you will have recovered the vast majority of your data, and depending on how the quota is calculated, ie: in real-time or at the start of the month, this may allow you to withdraw the data in a finite amount of iterations for free, or not :P. Though even if the latter is the case, there will come a point where it might be cheap to bite the bullet and download that last chunk of data, the quota be damned. Your ability to do without incurring charges this will depend also on the size of your objects, etc.).
So when should I use Glacier?
Basically, Glacier is for data you are:
- Almost 100% sure you will never need to access again (but need to keep around just in case, or for legal reasons, etc.)
- Don’t mind vendor lock-in, ie: you are happy to keep this data in AWS Glacier for a defacto eternity (ie: because there no easy switching providers without incurring potentially large data migration costs, see above cost calculations / reasoning)
This means Glacier fits a very tiny use case for most start-ups or pet projects. So much so that it basically translates to don’t ever use Glacier… unless you have very specific and thoroughly researched reasons for why, and even then, the defacto vendor lock-in should make you look closely at the other options (ie: S3 infrequent-access or Google near-line) which are more expensive on a per GB / month basis, but provide much cheaper (and much more rapid) restores.
What to do if you incurr a large bill accidentally
There are rumours online of this happening to unsuspecting users at these and even larger amounts (think $17K). Luckily, it seems that if you are a first-time offender, and a small fish (ie: hobbyist, small company, etc) the nice people at Amazon may forgive of your sins, if you call them and are persistent but polite :P.
However, I’m going to be coy and recommend not having it happen to you in the first place…
Experience with AWS has taught me that pricing is really quite tricky, and the final bill is rarely what you expect it to be at first (and rarely in your favor to say the least). So always move with caution when using AWS to gauge the actual costs you are likely to incur when using a service before scaling up to the next step.
More general takeaway: AWS is not Amazon retail, where pricing is simple and it’s difficult to make costly mistakes. This is enterprise, compadre.
I have found my AWS experience be notably different from my Amazon retail experience. In the latter everything is simple and transparent, there are basically never gimics or surprises, and it is quite hard to make costly mistakes. You can also buy with confidence that you are getting the best deal out there for your specific use-case. AWS on the other hand seems to be rife with complicated, obscure and at some times deceptive (intentional or just by virtue of incompentence of the explainers?) pricing models and claims thereof.
In addition, my feeling is that many users also misunderstand the true use-cases of the services, thinking it is a one-stop shop (and the only shop worth considering) for all their IT needs. While it is true that AWS can’t be beat in many areas, IMHO its use cases are far more specialized and far less applicable to the typical start-up / small business than one would guess by reading around online. For conventional needs (such as say, hosting your core servers and databases, even running some load balancers), you will often get cheaper, faster and better via dedicated server providers.
Note: The analysis contained here is based on my and others’ readings of AWS’s terms (and as of this writing only). I don’t represent Amazon so my analysis is quite unofficial and should be taken as such. Do be careful to check their terms for yourself in full detail if you are considering using their services. If anything, that is the main takeaway of this article. And yes, I am covering my ass a bit by putting in this disclaimer :P.
Note: Unless specifically specified otherwise, all currencies amounts are quoted in USD.