Who this document is for: People looking to house large MySQL data-sets on Amazon’s EC2 service, and people looking for the best (that I’ve found) all-in-EC2 solution for fault tolerance and data retention. People looking to get maximum availability.
Who this document is not for: People who are looking for something EASY. This isn’t it. People who have a small data-set which lends itself to just being copied around. And people to whom 100% uptime isn’t an issue. For all of you there are easier ways!
The problem (overview): The EC2 service is a wonderful thing, and it changes the nature of IT, which is no small feat! But there are a couple of problems with the service which make it less than ideal for some situations. To be more clear there is a problem with the ways that people are used to doing things as compared to the ways that things ought to be done with a service (or platform) like EC2. So, as I’ve advocated before, We’re going to look at shifting how *YOU* are thinking about your databases… And as with all change I promise this to sound bizarre and *BE* painful. Hang in there. You can do it!
The problem (in specific): There are two things that an EC2 AMI (which is what the amazon virtual machines are called) are lacking. The first and most direct of the two is that EC@ lacks immutable storage. At this point I would like to point out two things: A) EC2 is still in *BETA* lets not be too critical of the product until it hits prime time, okay guys?, and B) the AWS team is working on an immutable storage system to connect to EC2 (so sayeth the forums mods.) The lack of immutable storage means this: after you turn your machine on… you download and install all available security fixes… and you turn it off to play with later. When you turn it back on your machine again needs all of those security fixes… everything you do with your machine during runtime is LOST when the machine shuts down. You then boot, again, from a clean copy of your AMI image. The second problem is that you are not given a static IP for use with your AMI machine. Though this is the lesser of the two issues it’s more insidious. The two above “issues” lend themselves well to setting up a true cluster… but they don’t lend themselves at all to setting up a database cluster.
While discussing solutions for these problems let me lay the docroot bare here. I will be discussing how to work inside the limitations of the EC2 environment. There are better solutions than those I’m going to be touching on but I’m not a kernel hacker. I’ll be discussing things that you can do through architecting and normal system administration which will help you leverage EC2 in a consistent manner. We’ll also be assuming here that EC2 is a trustworthy service (e.g. i something breaks its your fault… and if its the fault of amazon that no more than 1 of your servers will go down). The method here is a lot like taking your dog to obedience class. The teacher at this class trains the dog owners… not the dog. Once the dog owners understand how to train the dog the “problem” solves itself.
Step #1: You are to drop the term (and idea of) monolithic databases from your brain. Don’t think it. Don’t utter it. I’ve touched on this briefly before (and if I haven’t I will in later posts.) As you design your database make sure that it can be split into as many databases as is needed in the future. And, if at all humanly possible, split horizontally instead of vertically! This not only ensures that you can keep your database instances under control it also, in the long run, carries your good performance a long long LONG way. You can control your size by splitting vertically (e.g. records 1-1,000,000 are in A, 1,000,001-2,000,000 are in B, 2,000,001-3,000,000 are in C. but this limits your speed on a given segment to the performance of the housing instance — don’t do that to yourself (You’ll regret it in the morning!) but if you have all records ending in 0 in A, ending in 1, in B, ending in 2 in C (and so on) you are able to take advantage of the fact that not only are the database footprints only 1/10th the disk size that they would have been monolithically but you also get 10x the performance increase once you have it on 10 different machines (later on.) And the beauty is that this scheme extends itself very well to even larger data-sets. use 00, 01, 02… or 001, 002, 003 for 1/100, 1/1000 (and beyond… you get the idea) These don’t all have to be housed on different servers to start off with. It’s good enough that the databases be setup properly to support this in the future.
The standard mechanism for fault tolerance in MySQL is replication. But there are a couple of things that people seldom realize about replication, and I’ll lay them bare here.
The first thing that people don’t understand is that you cannot keep your binary logs forever. OK thats not entirely true – if you don’t write to the db very often. But if you have a write intensive database you will eventually run out of storage. it’s just no fun to keep 900GB of binary logs handy! It also becomes impractical, at some point, to create a new database instance by re-reading all of the binary logs from a master. Processing all of your 75 billion inserts sequentially when you need a server up… NOW… is not fun at all! Not to mention the fact that if you, at some point ran a query which broke replication… you’ll find that your rebuilding has hung at that point and wont progress any further without manual intervention.
The other thing that people don’t realize is that repairing a broken (or installing a new) database instance means that you have to take an instance down. Imagine the scenario: you have two db servers, a master and a slave. The hard drives on the slave give out. You get replacements and pop them in the slave. Now it’s time to copy the data back over to the slave. Your options? A) run a mysqldump bringing your master to a crawling halt for the 8 hours it takes. or B) turn the master off, and copy the data manually taking much less time but bringing everything to a complete halt. The answer to this is, of course, to have at least one spare db instance which you can shut down safely while still remaining operational.
Step #2: I’m half the instance I used to be! With each AMI you get 160GB of (mutable) disk space, and almost 2GB of ram, and the equivalent of a Xeon 1.75Ghz processor. Now divide that, roughly, in half. You’ve done that little math exercise because your one AMI is going to act as 2 AMI’s. Thats right. I’m recommending running two separate instances of MySQL on the single server.
Before you start shouting at the heretic, hear me out!
+-----------+ +-----------+ | Server A | | Server B | +-----------+ +-----------+ | My | My | | My | My | | sQ | sQ | | sQ | sQ | | l | l | | l | l | | | | | | | | #2<=== #1 <===> #1 ===>#2 | | | | | | | + - - - - - + + - - - - - +
On each of our servers, MySQL #1 and #2 both occupy a max of 70Gb of space. The MySQL #1 instances of all the servers are setup in a master-master topography. And the #2 instance is setup as a slave only of the #1 instance on the same server. so on server A MySQL #2 is a copy (one way) of #1 on server A.
With the above setup *if* server B were to get restarted for some reason you could: A) shut down the MySQL instance #2 on server A. Copy that MySQL #2 over to Both slots on server B. Bring up #1 on server B (there should be no need to reconfigure its replication relationship because #2 pointed at #1 on server A already). Bring up #2 on server B, and reconfigure replication to pull from #1 on ServerB. This whole time #1 on Server A never went down. Your services were never disrupted.
Also with the setup above it is possible (and advised) to regularly shut down #2 and copy it into S3. This gives you one more layer of fault tollerance (and, I might add, the ability to backup without going down.)
- Why can we do this? and why would we do this?
We CAN do this for two reasons: first MySQL supports running multiple database servers on the same machine (thankfully.) second because we’ve set up our database schema in such a way that we can easily limit the space requirements of any given database instance. Allowing us to remain, no matter what, under the 70Gb mark on all of our database servers.
- Why WOULD do this for a couple of reasons, let me address specific questions individually
Why would we reduce our performance by putting two MySQL instances on one AMI? Because you’re a poor startup, and its the best alternative to paying for 4 or more instances to run, only, mysql. You could increase performance by paying for one AMI per database instance and keep the topography the same. I expect that once you CAN do this… you WILL do this. But likely the reason you’re using AMI is to avoid spending much capital up front until you make it big with some real money. So I’ve slanted this hypothesis with that end in mind.
- Why would we do something so complicated?
MySQL replication is complicated. It’s error prone. It’s harder (in the long run) than it looks. We use it, and this entire method of managing MySQL on AMI’s because its what we have available to us at our budget. Are there better overall solutions? Without placing the limitations that I’m constrained to here: yes! But again. We’re workign solely inside the EC2 framework…
- Why would we do something so susceptible to human error?
You’ve obviously never had someone place a hard copy in the wrong file folder. Or type reboot on the wrong machine. Or deleted the wrong file on your hard drive. If you think that operations (on a small scale) is any less error prone you’re fooling yourself! If you’re looking for speed and agility from your OPS team you have to trust them to do the best they can with the resources given. If you’re stuck *having* to use EC2 its likely because of budget and we’re looking at a circular set of circumstances. Make some good money and then hire a big ops team so that that can set in place a swamp of processes. The theory being the slower they have to move the moe they get a chance to notice something is wrong
- What would you recommend to make this solution more robust if you were able to spend a *little* more money?
I would put one or each replication cluster instance on an actually owned machine. Just in case we’re looking at an act-of-god style catastrophe at amazon… You’ll still have your data. This costs A) a server per cluster, and the bandwidth to support replication.
And finally what problems will arise that I’m not yet aware of?
A couple that I haven’t touched, actually.
- First MySQL replication requires that the server-id be a unique number for each instance in a cluster of computers. And each machine is running 2 instances of mysql (meaning two unique server ID’s per AMI.) The reason this is an issue is because every time you start your AMI instance the original my.cnf files will be there again, and without intervention all of your servers would end up having the same server ID, and replication would be so broken it will take you years to piece your data back together!
The easy way to circumvent this issue is to have a specific custom AMI build for each of your servers.
The elegant long-term solution is to devise a way, programatically (possibly using DNS, or even the Amazon SQS service) to obtain two unique server ID’s to use before running MySQL.
- Second: without static IP addresses from the EC2 service your AMI’s will have a new IP every time the server boots.
this can be dealt with either manually or programatically (possibly via a DNS registration, and some scripts resetting MySQL permissions.)
- Third: if, rather like a nursery rhyme which teaches children to deal with death by plague in medieval europe, “ashes ashes they all fall down” what do you do?
well hopefully they never “all fall down” because resetting a cluster from scratch is tedious work. But if they do you better hope that you took one of my two backup options seriously.
either you have a copy of a somewhat recent data-set in S3, or you have an offsite replication slave which can be used for just this circumstance..
or you’re back to square one…