When you should (and should not) think about using Amazon EC2

The Amazon AWS team has done it again. And EC2 is generating quite the talk. Perhaps I’ve not been watching the blogosphere closely enough about anything in particular until now (very likely) but I’ve not really seen this much general excitement. The ferver I see going around is alot like a kid at christmas. You unwrap your present. ITS A REMOTE CONTROLLER CAR. WOW! How cool! All of a sudden you have visions of chasing the neighborhood cats, and drag racing your friends on the neighborhood sidewalks. After you open it (and the general euphoria of the ideas start to fade) you realize: this is one of those cars that only turns one direction… And you just *know* that the next time you meet with your best friend bobby he will have a car that turns left *and* right.

I expect we will see some of this… A lot of the talk around the good old sphere is that AWS will be putting smaller hosting companies out of business. But thats not going to happen unless they change their pricing model. Which i doubt they will.

But before all you go getting your panties in a bunch when EC2 only turns left… Remember that EC2 is a tool. And just like you wouldn’t use a hammer to cut cleanly through a board. EC2 is not meant for all purposes… The trick to making genuinely good use of EC2 will be in playing off of its strengths… And avoiding its weaknesses.

Lets face it… The achillies heel of all the rampant early bird speculation is that the price of bandwidth for EC2 is rather high. Most hosting companies get you (with a low end plan) 1000Gb of transfer per month. Amazon charges $200 per month for that speed, whereas you can find low-end hosting for $60, and mid end hosting got $150. Clearly this is not where EC2 excells. And I dont think that the AWS team intended for it to excell here. How big of a headache would it be to run the servers which host every web site on the planet? Not very.

What you *do* get at a *great* price is horsepower. For a mere $74.40/month (assuming 31 days) you get the equivalent of a Xeon 1.75Ghz with 1.75Gb Ram. Thats not bad!

but the real thrill comes with the understanding that additional servers can talk to eachother over the network… for free. There is a private network (or QV) which you can make use of. This turns into a utility computing atom bomb. If you can monimize the amount of bandwidth used getting data back and forth to and from the machine, while maximizing its CPU and RAM utilization, then you have a winning combination which can take full use of the EC@ architecture. And if your setup is already using Amazon’s S3 storage solution… Well… Gravy

Imagine running a site like, say, youtube on EC2. It would kill you with the huge bill. the simple matter of the situation is that youtube uses too much bandwidth in the receiving and serving of its users files. I would have to imaging that the numbers for its bandwidth usage per month are staggering! But lets break out the things that youtube has to manage, and where it might be able to best utilize EC2 in its infrastructure.

Youtube gets files from its users. Converts those files into FLV’s. And then makes those FLV’s available via the internet. You therefore have 3 main actions that are preformed. A) HTTP PUT, B) Video Conversion, and C) HTTP GET. If I were there, and in a position of evaluating where EC2 miht prove useful to me I would probably be recommending the following changes to how things work:

First all incoming files will be uploaded directly to web servers running on EC2 AMIs. Theres no reason it should be uploaded to a Datacenter, and then re-uploaded to EC2, and then sent back down to the Datacenter — that makes no sense. So Users upload to EC2 Servers.

Second the EC2 compute farm is in charge of all video conversion. Video conversion is, typically, a high memory and high cpu usage process (as any video editor will tell you.) And when they built their datacenter I can assure you that this weighed heavily on their minds. You dont want to buy too many servers. You pay for them up front, and you pay for them in back as well. Not only do you purchase X number of servers for your compute farm but you have to be able to run them, and that means rack space and power. Let me tell you that those two commodities are not cheap in datacenters. You do not want to have to have servers sitting around doing nothing unless you have to! So how many servers they purchase and provision every quarter has a lot to do with their expected usage. If they dont purchase enough then the user has to wait for a long time for his requests to complete. Too many and you’re throwing away your investors money (which they dont particularly like.) So the ability to turn on and off servers in a compute farm only when they are needed (and better yet: to only pay for them when they’re on) is a godsend. This will save oodles of cash in the longrun.

At this point, as a side note, I would also be advising keeping long term backups of content in the S3 service. As well as removing rarely viewed content, and storing it in S3 only. This would reduce the amount of space that is needed at any one time in the real physical datacenter. Disks take up lots of power, and lots of space. You dont want to have to pay for storage you dont actually need. The tradeoff here is that transferring the content from S3 back to the DC will cost some money. So the cost of that versus the cost of running the storage hardware (or servers) youselves ends up being. I would caution that you can move from S3 to a SAN, but moving from a SAN to S3 leave you with a piece of junk which costs more than your house did ;D.

Third the EC2 servers upload the converted video file, and thumbnails to the primary (and real) datacenter. And it’s from here that the youtube viewers would be downloading the actual content.

That setup would be when you *DO* use Amazons new EC2 service. You’ve used the strengths of EC2 (unlimited horsepower at a very acceptable price,) while avoiding its weaknesses (expensive bandwidth, and paying for long term storage (unless S3 ended up being economical for what you do))

That said… There are plenty of places where you wouldnt want to use EC2 in a project. Any time you’ll be generating excessive amounts of traffic… you’re loosing money compared to a physical hosting solution.

In the end there is a lot of hype, and theres a lot of room for FUD / Uninformed Opinions (this blog post, for example, is an uninformed opinion — I’ve never used the service personally,) and what people need to keep in mind is that not every problem needs this solution. I would argue that its very likely that any organization could find one or (probably) more very good uses for EC2. But hosting your static content is not one of them. God help the first totally hosted EC2 user who gets majorly slashdotted ;).

I hope you found my uninformed public service anouncement very informative. Remember to vote for me in the next election 😉

cheers
Apok

As a side note…

I’ve also subscribed to an Amazon EC2 search at pubsup, technorati, and feedster. I’ll be watching the 3 to see which yeilds the most A) unique, and B) valuable results over time.

At the time of subscription
PubSub: 0 New Hits, 0 Good, 0 Bad
Technorati: 20 New Hits, 15 Good, 5 Bad
Feedster: 10 New Hits, 9 Good, 1 Bad

Here are some links which, at a glance, appeared interesting

over at Maluke Co. The Server spec’s are reportes as being "an instance is roughly equivalent to a system with a 1.7Ghz Xeon CPU, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth (bursting to 1Gb)" — I have not verified this, but it seems to be corroborated by other blogs. Thats an interesting find.

Oh… And…

I would also expect that, to save money for their users you will be seeing FreeBSD, Gentoo, Redhat, Fedora, Ubuntu, and Debian mirrors by EC2 and for EC2… (e.g. no traffic allowed except inside the DC.) And if Amazon is *SMART* they will provide this service to the distributions at no charge. This will, almost assuredly, spark a large amount of interest, and probably gaurantee more users.

If not the person who mirrors all these distros and lets people use them for $5.00 per month will make a buttload of cash 🙂 (hmm… that could be me! :))

Time will tell.

The internet is all a-buz about amazon

And rightly so! Amazon has, quite simply, outdone themselves this time. However before we go running through the streets lets make sure the emperor really does wear clothes.

A very common configuration of server provided by hosting companies is: 1 Server, 1Gb ram, 80Gb HDD, 1000Gb transfer. This runs $70-$80 for low end hosting, and $150-$200 for middle of the road grade hosting (Both prices per month). For the sake of our first comparison lets assume you youse every last GB of your bandwidth every month, and not a drop more.

Amazon EC2


1 Server @ $0.10/Hr (31*24*.1) = $74.40/mo
80Gb Hdd @ $0.15/Gb (80*.15) = $12.00/mo
1000Gb Xfer @ 0.20/Gb (1000*.2) = $200/mo


Total Cost Per Month: $286.40

Which is a little bit more costly than you’re used to, to be sure. Obviously the less bandwidth you use the less you pay for it. Where the *real* benifit of the EC2 service comes into play is in environments which are not bandwidth intensive, but CPU intensive. How much of a benifit this will provide remains to be seen. I am unaware of the stats on these “machines” E.G. CPU horsepower available, and RAM (as well as RAM speed) will be determinant in how useful this service is. If those (CPU/RAM) arent directly applicable terms as far as “MHZ” and “GB” go in the way that we’re used to dealing with them there will have to be some QV measure other than “Feels pretty fast.”

I predict that there will be several strategies, in the near future, for limiting the amount of bandwidth to these services (as the main cost of the service is in raw bandwidth used) Storing static content elsewhere will (i’m sure) be a key ingredient in these strategies.

I also predict that it will be a relatively short amount of time before we see some sort of beowulf or openmosix style implimentation geared directly at this service. Mark my words 🙂

More to come when I find out more — I’m *severely* interested in this… and I’m sorely dissappointed that I missed the beta (probably by only a couple of hours too…)

Bash wizardry: Command Line Switches

If you’re like me (and God help you if you are) You write a lot of bash scripts… When something comes up bash is a VERY handy language to use because it’s a) portable (between almost all *nixes), b) lightweight, and c) flexible (thanks to the plethora of linux commands which can be piped together) One large reason people prefer perl (or some other language) is because they’re more flexible. And one of those cases is processing command line switches. Commonly bash scripts are coded in a way which makes it necessary to give certain switches as a certain argument to the script. This makes the script brittle, and you CANNOT leave out switch $2 if you plan to use switch $3. Allow me to help you get around this rather nasty little inconvenience! (note: this deals with SWITHCES ONLY! *NOT* switches with arguments!)

check_c_arg() {
  count=0
  for i in $@
    do
      if [ $count -eq 0 ]
        then
          count=1
        else
          if [ "$i" = "$1" ]
            then
              return 1
          fi
      fi
  done
  return 0
}

This beautiful little bit of code will allow you to take switches in ANY order. Simply setup a script like this:

#!/bin/bash
host="$1"

check_c_arg() {
  count=0
  for i in $@
    do
      if [ $count -eq 0 ]
        then
          count=1
        else
          if [ "$i" = "$1" ]
            then
              return 1
          fi
      fi
  done
  return 0
}

check_c_arg "-v" $@
cfg_verbose=$?
check_c_arg "-d" $@
cfg_dry_run=$?
check_c_arg "-h" $@
cfg_help=$?


if [ $cfg_help -eq 1 ]
  then
    echo -e "Usage: $0 [-v] [-h]"
    echo -e "\t-v\tVerbose Mode"
    echo -e "\t-d\tDry run (echo command, do not run it)"
    echo -e "\t-h\tPrint this help message"
    exit 1
fi

if [ $cfg_dry_run -eq 1 ]
  then
    echo "ping -c 4 $host"
  else
    if [ $cfg_verbose -eq 1 ]
      then
        ping -c 4 $host
      else
        ping -c 4 $host 1>/dev/null 2>/dev/null
    fi
fi

In the above all of the following are valid:

  • 127.0.0.1 -v -d
  • 127.0.0.1 -d -v
  • 127.0.0.1 -v
  • 127.0.0.1 -d
  • 127.0.0.1 -h
  • 127.0.0.1 -h -v -d
  • 127.0.0.1 -h -d -v
  • 127.0.0.1 -h -v
  • 127.0.0.1 -h -d
  • 127.0.0.1 -v -h -d
  • 127.0.0.1 -d -h -v
  • 127.0.0.1 -v -h
  • 127.0.0.1 -d -h
  • 127.0.0.1 -v -d -h
  • 127.0.0.1 -d -v -h

I hope this helps inspire people to take the easy (and often times more correct) path when faced with a problem which requires a solution, but not necessarily a terribly complex one.

Cheers!
DK

The web already has the next office suite. Its called a blog.

I’m really suprised thateverybody is trying to create the “web based office” from scratch. We have a huge elephant in the room: the weblog.  Scrap the idea that we have to have a desktop app for the next gen of desktop tools. Scrap the idea that we have to make something which is a word processor. Take your head out of your butts and realize: we dont need anopther word processor! we need a REPLACEMENT for the word processor.

Take a blogging tool.  Create a content divide: public/private, add save and e-mail features (think PDF/RTF/ODF). Add some calendaring, A generic interface to mysql as a DB, and some spreadsheet and BAM. There you are.  You have your new e-mail client. Your new word processor, Your collaboration space, Your new callendaring tool. Youve got your 1:1 your 1:many and your 1:*.

Software companies need to realize that the blog is the next desktop.  We’re quickly entering the point in the evolution of the internet where the  user *is* represented in realtime by a digital avatar… sure its not the 3d one we saw in those terrible movies from the 80s (no lawnmowerman for you!) but we’re HERE. It’s NOW. People are living a connected life, and the gap between pushing, sharing, and publishing is shrinking each day.

The future is still tomorrow. But change is HERE.

Pet Project

I’ve started a pet project.  Its a remote tripwire like program. I’m doing it in python (largly because I want to learn python better.) I prototyped it in bash (heheh, so it cant be that hard of a project to make,) but moving it to something more real (and using sqlite instead of my “cat + grep” database)
The idea is to store all the files needed to actually preform the integrity checking locally, and then upload them to the remote server at the time that the scan is run.  It’s a pretty simple combination of find + md5sum + openssh + RSA/DSA authentication… add a store and compare and up-date of locally archived checksums… tack on an alerting feature… and run via cron… And you have the idea.

If anyone a) reads this, and b) wants to be in on it then drop me a line and I’ll put up an svn repo for it.

Cheers