When you should (and should not) think about using Amazon EC2

The Amazon AWS team has done it again. And EC2 is generating quite the talk. Perhaps I’ve not been watching the blogosphere closely enough about anything in particular until now (very likely) but I’ve not really seen this much general excitement. The ferver I see going around is alot like a kid at christmas. You unwrap your present. ITS A REMOTE CONTROLLER CAR. WOW! How cool! All of a sudden you have visions of chasing the neighborhood cats, and drag racing your friends on the neighborhood sidewalks. After you open it (and the general euphoria of the ideas start to fade) you realize: this is one of those cars that only turns one direction… And you just *know* that the next time you meet with your best friend bobby he will have a car that turns left *and* right.

I expect we will see some of this… A lot of the talk around the good old sphere is that AWS will be putting smaller hosting companies out of business. But thats not going to happen unless they change their pricing model. Which i doubt they will.

But before all you go getting your panties in a bunch when EC2 only turns left… Remember that EC2 is a tool. And just like you wouldn’t use a hammer to cut cleanly through a board. EC2 is not meant for all purposes… The trick to making genuinely good use of EC2 will be in playing off of its strengths… And avoiding its weaknesses.

Lets face it… The achillies heel of all the rampant early bird speculation is that the price of bandwidth for EC2 is rather high. Most hosting companies get you (with a low end plan) 1000Gb of transfer per month. Amazon charges $200 per month for that speed, whereas you can find low-end hosting for $60, and mid end hosting got $150. Clearly this is not where EC2 excells. And I dont think that the AWS team intended for it to excell here. How big of a headache would it be to run the servers which host every web site on the planet? Not very.

What you *do* get at a *great* price is horsepower. For a mere $74.40/month (assuming 31 days) you get the equivalent of a Xeon 1.75Ghz with 1.75Gb Ram. Thats not bad!

but the real thrill comes with the understanding that additional servers can talk to eachother over the network… for free. There is a private network (or QV) which you can make use of. This turns into a utility computing atom bomb. If you can monimize the amount of bandwidth used getting data back and forth to and from the machine, while maximizing its CPU and RAM utilization, then you have a winning combination which can take full use of the EC@ architecture. And if your setup is already using Amazon’s S3 storage solution… Well… Gravy

Imagine running a site like, say, youtube on EC2. It would kill you with the huge bill. the simple matter of the situation is that youtube uses too much bandwidth in the receiving and serving of its users files. I would have to imaging that the numbers for its bandwidth usage per month are staggering! But lets break out the things that youtube has to manage, and where it might be able to best utilize EC2 in its infrastructure.

Youtube gets files from its users. Converts those files into FLV’s. And then makes those FLV’s available via the internet. You therefore have 3 main actions that are preformed. A) HTTP PUT, B) Video Conversion, and C) HTTP GET. If I were there, and in a position of evaluating where EC2 miht prove useful to me I would probably be recommending the following changes to how things work:

First all incoming files will be uploaded directly to web servers running on EC2 AMIs. Theres no reason it should be uploaded to a Datacenter, and then re-uploaded to EC2, and then sent back down to the Datacenter — that makes no sense. So Users upload to EC2 Servers.

Second the EC2 compute farm is in charge of all video conversion. Video conversion is, typically, a high memory and high cpu usage process (as any video editor will tell you.) And when they built their datacenter I can assure you that this weighed heavily on their minds. You dont want to buy too many servers. You pay for them up front, and you pay for them in back as well. Not only do you purchase X number of servers for your compute farm but you have to be able to run them, and that means rack space and power. Let me tell you that those two commodities are not cheap in datacenters. You do not want to have to have servers sitting around doing nothing unless you have to! So how many servers they purchase and provision every quarter has a lot to do with their expected usage. If they dont purchase enough then the user has to wait for a long time for his requests to complete. Too many and you’re throwing away your investors money (which they dont particularly like.) So the ability to turn on and off servers in a compute farm only when they are needed (and better yet: to only pay for them when they’re on) is a godsend. This will save oodles of cash in the longrun.

At this point, as a side note, I would also be advising keeping long term backups of content in the S3 service. As well as removing rarely viewed content, and storing it in S3 only. This would reduce the amount of space that is needed at any one time in the real physical datacenter. Disks take up lots of power, and lots of space. You dont want to have to pay for storage you dont actually need. The tradeoff here is that transferring the content from S3 back to the DC will cost some money. So the cost of that versus the cost of running the storage hardware (or servers) youselves ends up being. I would caution that you can move from S3 to a SAN, but moving from a SAN to S3 leave you with a piece of junk which costs more than your house did ;D.

Third the EC2 servers upload the converted video file, and thumbnails to the primary (and real) datacenter. And it’s from here that the youtube viewers would be downloading the actual content.

That setup would be when you *DO* use Amazons new EC2 service. You’ve used the strengths of EC2 (unlimited horsepower at a very acceptable price,) while avoiding its weaknesses (expensive bandwidth, and paying for long term storage (unless S3 ended up being economical for what you do))

That said… There are plenty of places where you wouldnt want to use EC2 in a project. Any time you’ll be generating excessive amounts of traffic… you’re loosing money compared to a physical hosting solution.

In the end there is a lot of hype, and theres a lot of room for FUD / Uninformed Opinions (this blog post, for example, is an uninformed opinion — I’ve never used the service personally,) and what people need to keep in mind is that not every problem needs this solution. I would argue that its very likely that any organization could find one or (probably) more very good uses for EC2. But hosting your static content is not one of them. God help the first totally hosted EC2 user who gets majorly slashdotted ;).

I hope you found my uninformed public service anouncement very informative. Remember to vote for me in the next election 😉

cheers
Apok

Leave a Reply