MySQL on Amazon EC2 (my thoughts)

Who this document is for: People looking to house large MySQL data-sets on Amazon’s EC2 service, and people looking for the best (that I’ve found) all-in-EC2 solution for fault tolerance and data retention. People looking to get maximum availability.

Who this document is not for: People who are looking for something EASY. This isn’t it. People who have a small data-set which lends itself to just being copied around. And people to whom 100% uptime isn’t an issue. For all of you there are easier ways!

The problem (overview): The EC2 service is a wonderful thing, and it changes the nature of IT, which is no small feat! But there are a couple of problems with the service which make it less than ideal for some situations. To be more clear there is a problem with the ways that people are used to doing things as compared to the ways that things ought to be done with a service (or platform) like EC2. So, as I’ve advocated before, We’re going to look at shifting how *YOU* are thinking about your databases… And as with all change I promise this to sound bizarre and *BE* painful. Hang in there. You can do it!

The problem (in specific): There are two things that an EC2 AMI (which is what the amazon virtual machines are called) are lacking. The first and most direct of the two is that EC@ lacks immutable storage. At this point I would like to point out two things: A) EC2 is still in *BETA* lets not be too critical of the product until it hits prime time, okay guys?, and B) the AWS team is working on an immutable storage system to connect to EC2 (so sayeth the forums mods.) The lack of immutable storage means this: after you turn your machine on… you download and install all available security fixes… and you turn it off to play with later. When you turn it back on your machine again needs all of those security fixes… everything you do with your machine during runtime is LOST when the machine shuts down. You then boot, again, from a clean copy of your AMI image. The second problem is that you are not given a static IP for use with your AMI machine. Though this is the lesser of the two issues it’s more insidious. The two above “issues” lend themselves well to setting up a true cluster… but they don’t lend themselves at all to setting up a database cluster.

While discussing solutions for these problems let me lay the docroot bare here. I will be discussing how to work inside the limitations of the EC2 environment. There are better solutions than those I’m going to be touching on but I’m not a kernel hacker. I’ll be discussing things that you can do through architecting and normal system administration which will help you leverage EC2 in a consistent manner. We’ll also be assuming here that EC2 is a trustworthy service (e.g. i something breaks its your fault… and if its the fault of amazon that no more than 1 of your servers will go down). The method here is a lot like taking your dog to obedience class. The teacher at this class trains the dog owners… not the dog. Once the dog owners understand how to train the dog the “problem” solves itself.

Step #1: You are to drop the term (and idea of) monolithic databases from your brain. Don’t think it. Don’t utter it. I’ve touched on this briefly before (and if I haven’t I will in later posts.) As you design your database make sure that it can be split into as many databases as is needed in the future. And, if at all humanly possible, split horizontally instead of vertically! This not only ensures that you can keep your database instances under control it also, in the long run, carries your good performance a long long LONG way. You can control your size by splitting vertically (e.g. records 1-1,000,000 are in A, 1,000,001-2,000,000 are in B, 2,000,001-3,000,000 are in C. but this limits your speed on a given segment to the performance of the housing instance — don’t do that to yourself (You’ll regret it in the morning!) but if you have all records ending in 0 in A, ending in 1, in B, ending in 2 in C (and so on) you are able to take advantage of the fact that not only are the database footprints only 1/10th the disk size that they would have been monolithically but you also get 10x the performance increase once you have it on 10 different machines (later on.) And the beauty is that this scheme extends itself very well to even larger data-sets. use 00, 01, 02… or 001, 002, 003 for 1/100, 1/1000 (and beyond… you get the idea) These don’t all have to be housed on different servers to start off with. It’s good enough that the databases be setup properly to support this in the future.

The standard mechanism for fault tolerance in MySQL is replication. But there are a couple of things that people seldom realize about replication, and I’ll lay them bare here.

The first thing that people don’t understand is that you cannot keep your binary logs forever. OK thats not entirely true – if you don’t write to the db very often. But if you have a write intensive database you will eventually run out of storage. it’s just no fun to keep 900GB of binary logs handy! It also becomes impractical, at some point, to create a new database instance by re-reading all of the binary logs from a master. Processing all of your 75 billion inserts sequentially when you need a server up… NOW… is not fun at all! Not to mention the fact that if you, at some point ran a query which broke replication… you’ll find that your rebuilding has hung at that point and wont progress any further without manual intervention.

The other thing that people don’t realize is that repairing a broken (or installing a new) database instance means that you have to take an instance down. Imagine the scenario: you have two db servers, a master and a slave. The hard drives on the slave give out. You get replacements and pop them in the slave. Now it’s time to copy the data back over to the slave. Your options? A) run a mysqldump bringing your master to a crawling halt for the 8 hours it takes. or B) turn the master off, and copy the data manually taking much less time but bringing everything to a complete halt. The answer to this is, of course, to have at least one spare db instance which you can shut down safely while still remaining operational.

Step #2: I’m half the instance I used to be! With each AMI you get 160GB of (mutable) disk space, and almost 2GB of ram, and the equivalent of a Xeon 1.75Ghz processor. Now divide that, roughly, in half. You’ve done that little math exercise because your one AMI is going to act as 2 AMI’s. Thats right. I’m recommending running two separate instances of MySQL on the single server.

Before you start shouting at the heretic, hear me out!

+-----------+   +-----------+
| Server A  |   | Server B  |
+-----------+   +-----------+
| My  |  My |   | My  |  My |
| sQ  |  sQ |   | sQ  |  sQ |
| l   |  l  |   | l   |  l  |
|     |     |   |     |     |
| #2<=== #1 <===> #1 ===>#2 |
|     |     |   |     |     |
+ - - - - - +   + - - - - - +

On each of our servers, MySQL #1 and #2 both occupy a max of 70Gb of space. The MySQL #1 instances of all the servers are setup in a master-master topography. And the #2 instance is setup as a slave only of the #1 instance on the same server. so on server A MySQL #2 is a copy (one way) of #1 on server A.

With the above setup *if* server B were to get restarted for some reason you could: A) shut down the MySQL instance #2 on server A. Copy that MySQL #2 over to Both slots on server B. Bring up #1 on server B (there should be no need to reconfigure its replication relationship because #2 pointed at #1 on server A already). Bring up #2 on server B, and reconfigure replication to pull from #1 on ServerB. This whole time #1 on Server A never went down. Your services were never disrupted.

Also with the setup above it is possible (and advised) to regularly shut down #2 and copy it into S3. This gives you one more layer of fault tollerance (and, I might add, the ability to backup without going down.)

  • Why can we do this? and why would we do this?



    We CAN do this for two reasons: first MySQL supports running multiple database servers on the same machine (thankfully.) second because we’ve set up our database schema in such a way that we can easily limit the space requirements of any given database instance. Allowing us to remain, no matter what, under the 70Gb mark on all of our database servers.

  • Why WOULD do this for a couple of reasons, let me address specific questions individually

    Why would we reduce our performance by putting two MySQL instances on one AMI? Because you’re a poor startup, and its the best alternative to paying for 4 or more instances to run, only, mysql. You could increase performance by paying for one AMI per database instance and keep the topography the same. I expect that once you CAN do this… you WILL do this. But likely the reason you’re using AMI is to avoid spending much capital up front until you make it big with some real money. So I’ve slanted this hypothesis with that end in mind.

  • Why would we do something so complicated?

    MySQL replication is complicated. It’s error prone. It’s harder (in the long run) than it looks. We use it, and this entire method of managing MySQL on AMI’s because its what we have available to us at our budget. Are there better overall solutions? Without placing the limitations that I’m constrained to here: yes! But again. We’re workign solely inside the EC2 framework…

  • Why would we do something so susceptible to human error?

    You’ve obviously never had someone place a hard copy in the wrong file folder. Or type reboot on the wrong machine. Or deleted the wrong file on your hard drive. If you think that operations (on a small scale) is any less error prone you’re fooling yourself! If you’re looking for speed and agility from your OPS team you have to trust them to do the best they can with the resources given. If you’re stuck *having* to use EC2 its likely because of budget and we’re looking at a circular set of circumstances. Make some good money and then hire a big ops team so that that can set in place a swamp of processes. The theory being the slower they have to move the moe they get a chance to notice something is wrong 🙂

  • What would you recommend to make this solution more robust if you were able to spend a *little* more money?



    I would put one or each replication cluster instance on an actually owned machine. Just in case we’re looking at an act-of-god style catastrophe at amazon… You’ll still have your data. This costs A) a server per cluster, and the bandwidth to support replication.

And finally what problems will arise that I’m not yet aware of?

A couple that I haven’t touched, actually.

  • First MySQL replication requires that the server-id be a unique number for each instance in a cluster of computers. And each machine is running 2 instances of mysql (meaning two unique server ID’s per AMI.) The reason this is an issue is because every time you start your AMI instance the original my.cnf files will be there again, and without intervention all of your servers would end up having the same server ID, and replication would be so broken it will take you years to piece your data back together!

    The easy way to circumvent this issue is to have a specific custom AMI build for each of your servers.

    The elegant long-term solution is to devise a way, programatically (possibly using DNS, or even the Amazon SQS service) to obtain two unique server ID’s to use before running MySQL.

  • Second: without static IP addresses from the EC2 service your AMI’s will have a new IP every time the server boots.

    this can be dealt with either manually or programatically (possibly via a DNS registration, and some scripts resetting MySQL permissions.)
  • Third: if, rather like a nursery rhyme which teaches children to deal with death by plague in medieval europe, “ashes ashes they all fall down” what do you do?

    well hopefully they never “all fall down” because resetting a cluster from scratch is tedious work. But if they do you better hope that you took one of my two backup options seriously.

    either you have a copy of a somewhat recent data-set in S3, or you have an offsite replication slave which can be used for just this circumstance..

    or you’re back to square one…

There is a real issue to be discussed…

Amazons EC2 service is, by all accounts, brillian. But one of the things that it lacks is any sort of assurance regarding data permanence. What I mean is each machine that you turn on has 160GB of storage, but if that server instance is ever shut off the data is *lost* (not corrubted byt GONE) and the next time you start that server instance it is back to the base image… you cannot save data on EC2 between reboots.

This is not an issue for everyone. But for people looking to run services on EC2 which require some sort of permanent storage solution (as in databases like MySQL or PotsgreSQL) this is a big show stopper. I’ll be putting some real thought in the next few days on how to sidestep this pitfall. I’m sure that it can be done, and I even have some ideas on how to do it. But I want to think them through and do a little research before I go blurting them out and making (more of) an ass out of myself 🙂

So… More on this later.

DK

When you should (and should not) think about using Amazon EC2

The Amazon AWS team has done it again. And EC2 is generating quite the talk. Perhaps I’ve not been watching the blogosphere closely enough about anything in particular until now (very likely) but I’ve not really seen this much general excitement. The ferver I see going around is alot like a kid at christmas. You unwrap your present. ITS A REMOTE CONTROLLER CAR. WOW! How cool! All of a sudden you have visions of chasing the neighborhood cats, and drag racing your friends on the neighborhood sidewalks. After you open it (and the general euphoria of the ideas start to fade) you realize: this is one of those cars that only turns one direction… And you just *know* that the next time you meet with your best friend bobby he will have a car that turns left *and* right.

I expect we will see some of this… A lot of the talk around the good old sphere is that AWS will be putting smaller hosting companies out of business. But thats not going to happen unless they change their pricing model. Which i doubt they will.

But before all you go getting your panties in a bunch when EC2 only turns left… Remember that EC2 is a tool. And just like you wouldn’t use a hammer to cut cleanly through a board. EC2 is not meant for all purposes… The trick to making genuinely good use of EC2 will be in playing off of its strengths… And avoiding its weaknesses.

Lets face it… The achillies heel of all the rampant early bird speculation is that the price of bandwidth for EC2 is rather high. Most hosting companies get you (with a low end plan) 1000Gb of transfer per month. Amazon charges $200 per month for that speed, whereas you can find low-end hosting for $60, and mid end hosting got $150. Clearly this is not where EC2 excells. And I dont think that the AWS team intended for it to excell here. How big of a headache would it be to run the servers which host every web site on the planet? Not very.

What you *do* get at a *great* price is horsepower. For a mere $74.40/month (assuming 31 days) you get the equivalent of a Xeon 1.75Ghz with 1.75Gb Ram. Thats not bad!

but the real thrill comes with the understanding that additional servers can talk to eachother over the network… for free. There is a private network (or QV) which you can make use of. This turns into a utility computing atom bomb. If you can monimize the amount of bandwidth used getting data back and forth to and from the machine, while maximizing its CPU and RAM utilization, then you have a winning combination which can take full use of the EC@ architecture. And if your setup is already using Amazon’s S3 storage solution… Well… Gravy

Imagine running a site like, say, youtube on EC2. It would kill you with the huge bill. the simple matter of the situation is that youtube uses too much bandwidth in the receiving and serving of its users files. I would have to imaging that the numbers for its bandwidth usage per month are staggering! But lets break out the things that youtube has to manage, and where it might be able to best utilize EC2 in its infrastructure.

Youtube gets files from its users. Converts those files into FLV’s. And then makes those FLV’s available via the internet. You therefore have 3 main actions that are preformed. A) HTTP PUT, B) Video Conversion, and C) HTTP GET. If I were there, and in a position of evaluating where EC2 miht prove useful to me I would probably be recommending the following changes to how things work:

First all incoming files will be uploaded directly to web servers running on EC2 AMIs. Theres no reason it should be uploaded to a Datacenter, and then re-uploaded to EC2, and then sent back down to the Datacenter — that makes no sense. So Users upload to EC2 Servers.

Second the EC2 compute farm is in charge of all video conversion. Video conversion is, typically, a high memory and high cpu usage process (as any video editor will tell you.) And when they built their datacenter I can assure you that this weighed heavily on their minds. You dont want to buy too many servers. You pay for them up front, and you pay for them in back as well. Not only do you purchase X number of servers for your compute farm but you have to be able to run them, and that means rack space and power. Let me tell you that those two commodities are not cheap in datacenters. You do not want to have to have servers sitting around doing nothing unless you have to! So how many servers they purchase and provision every quarter has a lot to do with their expected usage. If they dont purchase enough then the user has to wait for a long time for his requests to complete. Too many and you’re throwing away your investors money (which they dont particularly like.) So the ability to turn on and off servers in a compute farm only when they are needed (and better yet: to only pay for them when they’re on) is a godsend. This will save oodles of cash in the longrun.

At this point, as a side note, I would also be advising keeping long term backups of content in the S3 service. As well as removing rarely viewed content, and storing it in S3 only. This would reduce the amount of space that is needed at any one time in the real physical datacenter. Disks take up lots of power, and lots of space. You dont want to have to pay for storage you dont actually need. The tradeoff here is that transferring the content from S3 back to the DC will cost some money. So the cost of that versus the cost of running the storage hardware (or servers) youselves ends up being. I would caution that you can move from S3 to a SAN, but moving from a SAN to S3 leave you with a piece of junk which costs more than your house did ;D.

Third the EC2 servers upload the converted video file, and thumbnails to the primary (and real) datacenter. And it’s from here that the youtube viewers would be downloading the actual content.

That setup would be when you *DO* use Amazons new EC2 service. You’ve used the strengths of EC2 (unlimited horsepower at a very acceptable price,) while avoiding its weaknesses (expensive bandwidth, and paying for long term storage (unless S3 ended up being economical for what you do))

That said… There are plenty of places where you wouldnt want to use EC2 in a project. Any time you’ll be generating excessive amounts of traffic… you’re loosing money compared to a physical hosting solution.

In the end there is a lot of hype, and theres a lot of room for FUD / Uninformed Opinions (this blog post, for example, is an uninformed opinion — I’ve never used the service personally,) and what people need to keep in mind is that not every problem needs this solution. I would argue that its very likely that any organization could find one or (probably) more very good uses for EC2. But hosting your static content is not one of them. God help the first totally hosted EC2 user who gets majorly slashdotted ;).

I hope you found my uninformed public service anouncement very informative. Remember to vote for me in the next election 😉

cheers
Apok

Bash wizardry: Command Line Switches

If you’re like me (and God help you if you are) You write a lot of bash scripts… When something comes up bash is a VERY handy language to use because it’s a) portable (between almost all *nixes), b) lightweight, and c) flexible (thanks to the plethora of linux commands which can be piped together) One large reason people prefer perl (or some other language) is because they’re more flexible. And one of those cases is processing command line switches. Commonly bash scripts are coded in a way which makes it necessary to give certain switches as a certain argument to the script. This makes the script brittle, and you CANNOT leave out switch $2 if you plan to use switch $3. Allow me to help you get around this rather nasty little inconvenience! (note: this deals with SWITHCES ONLY! *NOT* switches with arguments!)

check_c_arg() {
  count=0
  for i in $@
    do
      if [ $count -eq 0 ]
        then
          count=1
        else
          if [ "$i" = "$1" ]
            then
              return 1
          fi
      fi
  done
  return 0
}

This beautiful little bit of code will allow you to take switches in ANY order. Simply setup a script like this:

#!/bin/bash
host="$1"

check_c_arg() {
  count=0
  for i in $@
    do
      if [ $count -eq 0 ]
        then
          count=1
        else
          if [ "$i" = "$1" ]
            then
              return 1
          fi
      fi
  done
  return 0
}

check_c_arg "-v" $@
cfg_verbose=$?
check_c_arg "-d" $@
cfg_dry_run=$?
check_c_arg "-h" $@
cfg_help=$?


if [ $cfg_help -eq 1 ]
  then
    echo -e "Usage: $0 [-v] [-h]"
    echo -e "\t-v\tVerbose Mode"
    echo -e "\t-d\tDry run (echo command, do not run it)"
    echo -e "\t-h\tPrint this help message"
    exit 1
fi

if [ $cfg_dry_run -eq 1 ]
  then
    echo "ping -c 4 $host"
  else
    if [ $cfg_verbose -eq 1 ]
      then
        ping -c 4 $host
      else
        ping -c 4 $host 1>/dev/null 2>/dev/null
    fi
fi

In the above all of the following are valid:

  • 127.0.0.1 -v -d
  • 127.0.0.1 -d -v
  • 127.0.0.1 -v
  • 127.0.0.1 -d
  • 127.0.0.1 -h
  • 127.0.0.1 -h -v -d
  • 127.0.0.1 -h -d -v
  • 127.0.0.1 -h -v
  • 127.0.0.1 -h -d
  • 127.0.0.1 -v -h -d
  • 127.0.0.1 -d -h -v
  • 127.0.0.1 -v -h
  • 127.0.0.1 -d -h
  • 127.0.0.1 -v -d -h
  • 127.0.0.1 -d -v -h

I hope this helps inspire people to take the easy (and often times more correct) path when faced with a problem which requires a solution, but not necessarily a terribly complex one.

Cheers!
DK

But… But… But… Why didnt they pick ME?!

There’s a lot of talk going around about Ubuntu linux versus Debian linux versus XYZ linuz and why Ubuntu has become popular (even trendy!) But It seems to me that most of this talk boils down to “But I think MY distro is better” whether “my distro” means “I made it” or just “I use it.”

For years now everybody in the linux community has been saying “Linux can make the consumer desktop.” And I always believed it could (though I never (and sill dont) believed it was there yet.) But now that someone has made something people want to use, and like using, there seems to be a lot of “but this was my idea” and “they didnt do that first, this other did” and even “I cant figure out why this is so popular.”

Welcome to Rome! Where you’re free to worship whatever you like, but you have to admit that Ubuntu has managed to make it big. At least in this rome there arent any taxes to pay. But the simple fact of the matter is tht Ubuntu lilnux pulled together the right combination of things at the right time and in the right place. They were different enough to get noticed amongst a sea of toy (and corp) distributions.

As with all great breakthroughs Ubuntu *HAS* stood on the backs of the giants that came before them. But just as in scientific discovery that fact doesnt discount the new things that have happened! Because to *truly* bake an apple pie from scratch one must first create the universe.