An 90 second introduction to multidimensional arrays in PHP

If a word is a variable, then a sentence is an array. A paragraph is an array of sentences, a chapter is an array of paragraphs, a book is an array of chapters, and a library is an array of books. This would look, in php, like this:

  $library = array (
    'book1' => array (
      'chapter1' => array (
        'sentence1' => array (
          'word1' = "In",
          'word2' = "the",
          'word3' = "beginning",
        ),
      ),
    ),
  );

Therefor $library[‘book1’][‘chapter1’][‘sentence1’][‘word2’] is “the”. And $library[‘book1’][‘chapter1’][‘sentence1’] is equal to array ( ‘word1’ => “In”, ‘word2’ => “the”, ‘word3’ => “beginning”, );

And thats an array. Thus closes our discussion on arrays in PHP … huh? whats that? oh… you need more? Well sure there are a zillion uses for arrays, and learning to think in arrays often takes running into a situation where using anything else becomes less than viable. But for the sake of argument lets pretend we’re keeping simple track of deposits, withdrawals, and a balance. In this app every transaction invariably has a couple of pieces of information: Transaction date, Second Party, Amount, And a type (deposit or withdrawal).

array (
  'date'   => $$,
  'type'   => $$,
  'party'  => $$,
  'amount' => $$,
)

Our balance sheet is simply an array of those arrays

$sheet=array (
  '0' => array (
    'date' => 'monday',
    'type' => 'd',
    'party' => 'employer',
    'amount' => 1234.56,
  );
  '1' => array (
    'date' => 'tuesday',
    'type' => 'w',
    'party' => 'rent',
    'amount' => 500,
  );
  '2' => array (
    'date' => 'wednesday',
    'type' => 'w',
    'party' => 'computer store',
    'amount' => 712.59,
  );
);

This, while fictitious, should give a good example of how a multidimensional array works. We can get a balance with a very simple loop using php’s foreach() conrol structure.

$balance=0;
foreach ( $sheet as $transaction_id => $details ) {
  switch ( $details['type'] ) {
    case 'w':
      $balance=$balance - $details['amount'];
      break;
    case 'd':
      $balance=$balance + $details['amount'];
      break;
  }
  echo "[{$details['type']}]\t "
        ."{$details['party']}\t "
        ."Amount: {$details['amount']}\t "
        ."Balance: {$balance}
";
}

That is basically everything you need to know to start (of COURSE there’s more to learn) working with multidimensional arrays, except for one thing. When you’re faced with working with somebody else’s data structures you will need to get information about how they are laying out there arrays. The slow painful way of doing this is examining the code. The quick happy way us to use either var_dump() or print_r(). I prefer print_r for most jobs just remember to wrap the output of print_r in <pre></pre> tags if you’re doing this debugging in a browser… trust me. it’ll help a lot.

Rules of thumb for high availability systems (Infrastructure)

Never be at more than Â½ capacity

If you’re planning a truly highly available system then you have to be aware that a serious percentage of your hardware can be forcefully cut from your organizations torso at any moment. You are also not exempt from this rule on holidays, weekends, or vacations. Loosing power equipment, Loosing networking gear, the help tripping over cables, acts of God. If you aren’t prepared to have a random half of your organizations hardware disconnected at any moment then you aren’t H.A. yet.

If you don’t have 2 spares then you arent yet ready

Murphy was an optimist. If you’ve ever replaced a dying (or dead) hard drive with a new hard drive which… doesn’t work. Or ram, or a CPU. Then you haven’t been in ops long enough. Sometimes your backup plan needs a backup plan. And you have to have it. Theres no excuse for being off line, so you need not only one but two (or more) possible replacements for a point of failure.

Disaster Recovery is an ongoing process

The tricky thing about Highly Available systems is that you have to be working… while you’re recovering. Any time you’re planning your HA setup, and you work around a point of failure, stop and think a moment on what it will take to replace that failed point. If it required bringing things down again… thats no good.

Growth planning should always be done in exponents

Never again are you to talk (or think) of doubling growth. You shall from this point forward think in squares, and cubes, and the like. In the age of information you tend to gather data at an alarming rate, don’t let it overtake you!

If you depend on a backup, it’s not HA

â€œWhats that? The primary server is off line? Do we have a spare? No, but we have a backup. How long? Oh… 36 hours… What? No, I can’t speed it up.â€ Lets face it if you’re restoring your live system from backup you’ve screwed the pooch. Backup is NOT high availability but it is good practice, and when it comes down to it 36 hours is marginally better than never.

Self healing requires more thought than you’ve given it

The simple fact of life in the data center is that all services are an interlocking tapestry. And if the threads break down the tassels fall off. Self healing is not only about detection and removal its also about rerouting data. If the database server that you normally write to has gone down, you can detect it, but can you instantly rewire the 8 different internal services which feed into the database to write to a different server? And then back again?

DNS is harder than you think, and it matters more than ever

The one infrastructure that people rely on most, and know the least about, is DNS. Dns might as well be a piece of hardware, because if your users cant type in www.blah.com to get to you, theres absolutely zero chance they’ll have your IP address handy. Worse yet, DNS is the number one thing that I see administrators screw up all the time. Talking zone files with (sometimes veteran) administrators is like talking in Klingon to a 2 year old. It usually doesn’t work too well.

Rules of thumb for high availability systems (Databases)

Replicating data takes longer than you think

In this brave new world of terrabytes per week theres a nasty truth. Replicating that much data across a large number of nodes is a headache. And it’s usually not as fast as you want it to be. Instantaneous replication is nice, but generally speaking you’re writing to one server and reading from X number of others. Your read servers, therefor, not only bar the same load as the write server (having to replicate everything that goes into the write server) but has to bear the additional load of supporting the read requests. A frequent mistake that admins make is putting the best hardware into the write server, and using lesser machines for the read servers. But if you’re truly processing large amounts of data this create a dangerous situation where if a read server stops for a while it might take days or weeks to catch up. Bad juju.

Less is more, and then more is more, and then less is more again

In the beginning you had data optomization. Everything pointed to something, and your masterfully crafted database schema duplicated absolutely no piece of information. And then you increased your size and volume to the point that this approach became too cumbersome to sustain your access time. You moved over to a new schema where you could select all the data you need in one statement, but data is duplicated everywhere. And finally this monolithic approach has locked you into multi-million dollar pieces of hardware, so you need to re-normalize your data so that it can be partitioned onto multiple clusters. Expect this, Plan for it, and be prepared for the hard truth: this is a truly painful process!

Spend the money here, if nowhere else

If you deal in information, you absolutely have to spend real money here. This is not the place to skimp. If you do… you’ll be sorry.

Rules of thumb for high availability systems (Employees and Departments)

False positives breed contempt

If you routinely get SMS alerts for no reason at 3:00am when you’re sound asleep. And it always ends up being a false alarm. There will come a time when you just opt to ignore the pager. And this time not only will wolf have been cried, the flock is truly under attack. Always always work to reduce false positives, and set reasonable alerting thresholds. Is something an emergency worth getting up for at 3:00am? Or isn’t it? Sure a web server went down, and was removed. But there are 13 others all functioning. You can sleep. But if you lost half of them… somethings probably up!

No department is an island

Contrary to popular belief, it takes more than the ops department to design a truly HA system. For example, your admins aren’t allowed to just start monkeying with the database schema when they feel like it. Sure its more highly available now, but the application cant use it any more. Just as no man is an island, neither is the ops department. You can work with them (good) or you can work against them (bad) but choose wisely.

If operations warns that the sky is going to fall, take them seriously

Lets face it. If your auto mechanic says your alternator will die very soon â€“ you replace it. If your inspector says you’ve got the beginnings of a termite problem â€“ you adddress it. If your weatherman tells you it might rain today â€“ you grab your umbrella on your way out the door. And when your ops team comes into your office telling you that you have exactly 90 days until your database server becomes a very heavy very hot very expensive paper weight â€“ why would you ignore that? Usually when ops says the sky is about to fall it’s because they were up in the clouds fixing the slightly off color shade of silver you were complaining about and saw the cracks forming themselves. Ignore them at your own risk, but don’t say they didn’t warn you.

If you don’t spend the money on ops, nothing runs.

Without your engine your car doesnt run. Without your heart you die. And without giving the ops necessary resources department the application that you’ve invested so heavily in will not run. Because there will be nothing to run it on. Or worse yet: it’ll run but something will break every 3rd day. You cannot skimp here. Well you can, but you don’t get high availability as well as a low price tag. It’s a pain in the ass… but when you bought the Saturn you had no right to expect Nascar results.

The RDBMS Misconception That Less is More

It’s commonly held that normalization is a good thing. And it is. But like all good, or more to the point TRUE, things there are circumstances in which the opposite hold true.

The “proper” way to layour a database schema is something as ever changing as the tides. Rather like the US justice system we find that things which once held true no longer do. Or that things which were once absolute do, actually, have extenuating circumstances under which they arent — exactly — absolute.

The proper way to lay out an RDBMS system is to look at a very simple ratio: Space VS Speed. The less duplication of data in your database the more efficient (in terms of space disk used) it is. In exchange for that dis space savings you incur the cost of additional disk seeks.

For example, if you’re keeping track of your users information (e.g. who’s registered and who hasnt) You might use a table like this:

Users: |  userId | firstName | lastName | eMail | cryptPasswd |

But in all likelyhood you’re going to have a lot of users with a common first and last name! Normalization to the rescue (or so it seems — at first)

Users: | userId | firstNameId | lastNameId | eMail | cryptPasswd |
FirstNames: | firstNameId | firstName |
LastNames: | lastNameId | lastName |

Now, instead of storing the string “John” a thousand times for the thousand users with the first name of John, you store the string once, and you have an integer field which related (the R in RDBMS) to a normallized list of names.

But… the cost is that now any time you want to pull a name from the table it requires 3 lookups.

select firstNameId,lastNameId from Users where userId = 1

select firstName from FirstNames where firstNameId=x

select lastName from LastNames where lastNameId=y

Where the same would have been done with the following query before normalization

select firstName, lastName from Users where userId=1

It gets worse when you’re computng values based on information stored in your tables. For example if you are looking for the number of times a user has visited a certain page, so that you can show them the information on the page they are viewing (or perhaps to do some checking on that value each time they visit to prevent, for example, site mirroring). You might already be storing what people are doing on the site in a table called UserActionLog for debugging, tracking, or statistical purposes. And you use the data in that table to run reports on a, say, weekly basis.

You COULD use something like this to gather the information about the user each time they visit a page:

select count(pageId) from UserActionLog where userId=x and pageId=y

But you will probbaly find that duplicating this data is a much more cpu effective, though disc inefficient, way of solving the problem. Storing something like this in a new table would yeild a much faster result for something which will be accessed continuously

PageVisitsByUser: | pageId | userId | totalVisits | lastVisit |

Now is this always going to hold true? Well no. The places you’ll find where it doesnt matter are the places in which you have WAY more resources than your needs require. For example you only have 100 users, and get hits on pages which require database access rarely. Applications like this dont need optomization because the advancing state of computing hardware *IS* the optomization that they need.

However as you process more and more volume you’ll find time and time again that a 1/1000 second per hit advantage is an 11.5 DAY (1,000,000 seconds) savings for 1 billion hits… even with only a million hits thats a 16 minute per day savings. You can see how the savings stacks up when you start adding in powers of 10

Thats the real challenge of the Web2.0 movement. Finding the amount of data versus the need to use that data which hits the sweet spot. What can we do with what we’ve got that people want?. I’d argue that as warfare in the 20th century was defined by gunpoweder, Web2.0 is a battle defined by its data schema

Myth: Linux doesnt need updates out of the box

I’ve just installed a fresh (from the dvd) Fedora Core 5 install. I checked all packages available to me in the installer (except the languages, because I’m monolingual) and “$ yum update” is now downloading 389 updates (thats almost 1GB)

So while I still think that the *nix OS’s are *WAY* better than the MS OS’s… The idea that linux doesnt need as many security updates out of the box as windows, is clearly a myth.

Unless: You installed the Linux release as soon as it came out (I.E. during the initial mirroring process), *OR* you built your OS from scratch. Even then over the course of your installed lifetime you’ll be applying a *LOT* of patches (or upgrades if you wish)

As a side note: The number of securioty updates being low would be, in my mind, a bad thing. You *WANT* your OS people to be consious of the fact that there are other people smarter than they are 🙂

MySQL on Amazon EC2 (my thoughts)

Who this document is for: People looking to house large MySQL data-sets on Amazon’s EC2 service, and people looking for the best (that I’ve found) all-in-EC2 solution for fault tolerance and data retention. People looking to get maximum availability.

Who this document is not for: People who are looking for something EASY. This isn’t it. People who have a small data-set which lends itself to just being copied around. And people to whom 100% uptime isn’t an issue. For all of you there are easier ways!

The problem (overview): The EC2 service is a wonderful thing, and it changes the nature of IT, which is no small feat! But there are a couple of problems with the service which make it less than ideal for some situations. To be more clear there is a problem with the ways that people are used to doing things as compared to the ways that things ought to be done with a service (or platform) like EC2. So, as I’ve advocated before, We’re going to look at shifting how *YOU* are thinking about your databases… And as with all change I promise this to sound bizarre and *BE* painful. Hang in there. You can do it!

The problem (in specific): There are two things that an EC2 AMI (which is what the amazon virtual machines are called) are lacking. The first and most direct of the two is that EC@ lacks immutable storage. At this point I would like to point out two things: A) EC2 is still in *BETA* lets not be too critical of the product until it hits prime time, okay guys?, and B) the AWS team is working on an immutable storage system to connect to EC2 (so sayeth the forums mods.) The lack of immutable storage means this: after you turn your machine on… you download and install all available security fixes… and you turn it off to play with later. When you turn it back on your machine again needs all of those security fixes… everything you do with your machine during runtime is LOST when the machine shuts down. You then boot, again, from a clean copy of your AMI image. The second problem is that you are not given a static IP for use with your AMI machine. Though this is the lesser of the two issues it’s more insidious. The two above “issues” lend themselves well to setting up a true cluster… but they don’t lend themselves at all to setting up a database cluster.

While discussing solutions for these problems let me lay the docroot bare here. I will be discussing how to work inside the limitations of the EC2 environment. There are better solutions than those I’m going to be touching on but I’m not a kernel hacker. I’ll be discussing things that you can do through architecting and normal system administration which will help you leverage EC2 in a consistent manner. We’ll also be assuming here that EC2 is a trustworthy service (e.g. i something breaks its your fault… and if its the fault of amazon that no more than 1 of your servers will go down). The method here is a lot like taking your dog to obedience class. The teacher at this class trains the dog owners… not the dog. Once the dog owners understand how to train the dog the “problem” solves itself.

Step #1: You are to drop the term (and idea of) monolithic databases from your brain. Don’t think it. Don’t utter it. I’ve touched on this briefly before (and if I haven’t I will in later posts.) As you design your database make sure that it can be split into as many databases as is needed in the future. And, if at all humanly possible, split horizontally instead of vertically! This not only ensures that you can keep your database instances under control it also, in the long run, carries your good performance a long long LONG way. You can control your size by splitting vertically (e.g. records 1-1,000,000 are in A, 1,000,001-2,000,000 are in B, 2,000,001-3,000,000 are in C. but this limits your speed on a given segment to the performance of the housing instance — don’t do that to yourself (You’ll regret it in the morning!) but if you have all records ending in 0 in A, ending in 1, in B, ending in 2 in C (and so on) you are able to take advantage of the fact that not only are the database footprints only 1/10th the disk size that they would have been monolithically but you also get 10x the performance increase once you have it on 10 different machines (later on.) And the beauty is that this scheme extends itself very well to even larger data-sets. use 00, 01, 02… or 001, 002, 003 for 1/100, 1/1000 (and beyond… you get the idea) These don’t all have to be housed on different servers to start off with. It’s good enough that the databases be setup properly to support this in the future.

The standard mechanism for fault tolerance in MySQL is replication. But there are a couple of things that people seldom realize about replication, and I’ll lay them bare here.

The first thing that people don’t understand is that you cannot keep your binary logs forever. OK thats not entirely true – if you don’t write to the db very often. But if you have a write intensive database you will eventually run out of storage. it’s just no fun to keep 900GB of binary logs handy! It also becomes impractical, at some point, to create a new database instance by re-reading all of the binary logs from a master. Processing all of your 75 billion inserts sequentially when you need a server up… NOW… is not fun at all! Not to mention the fact that if you, at some point ran a query which broke replication… you’ll find that your rebuilding has hung at that point and wont progress any further without manual intervention.

The other thing that people don’t realize is that repairing a broken (or installing a new) database instance means that you have to take an instance down. Imagine the scenario: you have two db servers, a master and a slave. The hard drives on the slave give out. You get replacements and pop them in the slave. Now it’s time to copy the data back over to the slave. Your options? A) run a mysqldump bringing your master to a crawling halt for the 8 hours it takes. or B) turn the master off, and copy the data manually taking much less time but bringing everything to a complete halt. The answer to this is, of course, to have at least one spare db instance which you can shut down safely while still remaining operational.

Step #2: I’m half the instance I used to be! With each AMI you get 160GB of (mutable) disk space, and almost 2GB of ram, and the equivalent of a Xeon 1.75Ghz processor. Now divide that, roughly, in half. You’ve done that little math exercise because your one AMI is going to act as 2 AMI’s. Thats right. I’m recommending running two separate instances of MySQL on the single server.

Before you start shouting at the heretic, hear me out!

+-----------+   +-----------+
| Server A  |   | Server B  |
+-----------+   +-----------+
| My  |  My |   | My  |  My |
| sQ  |  sQ |   | sQ  |  sQ |
| l   |  l  |   | l   |  l  |
|     |     |   |     |     |
| #2<=== #1 <===> #1 ===>#2 |
|     |     |   |     |     |
+ - - - - - +   + - - - - - +

On each of our servers, MySQL #1 and #2 both occupy a max of 70Gb of space. The MySQL #1 instances of all the servers are setup in a master-master topography. And the #2 instance is setup as a slave only of the #1 instance on the same server. so on server A MySQL #2 is a copy (one way) of #1 on server A.

With the above setup *if* server B were to get restarted for some reason you could: A) shut down the MySQL instance #2 on server A. Copy that MySQL #2 over to Both slots on server B. Bring up #1 on server B (there should be no need to reconfigure its replication relationship because #2 pointed at #1 on server A already). Bring up #2 on server B, and reconfigure replication to pull from #1 on ServerB. This whole time #1 on Server A never went down. Your services were never disrupted.

Also with the setup above it is possible (and advised) to regularly shut down #2 and copy it into S3. This gives you one more layer of fault tollerance (and, I might add, the ability to backup without going down.)

Why can we do this? and why would we do this?

We CAN do this for two reasons: first MySQL supports running multiple database servers on the same machine (thankfully.) second because we’ve set up our database schema in such a way that we can easily limit the space requirements of any given database instance. Allowing us to remain, no matter what, under the 70Gb mark on all of our database servers.
Why WOULD do this for a couple of reasons, let me address specific questions individually
Why would we reduce our performance by putting two MySQL instances on one AMI? Because you’re a poor startup, and its the best alternative to paying for 4 or more instances to run, only, mysql. You could increase performance by paying for one AMI per database instance and keep the topography the same. I expect that once you CAN do this… you WILL do this. But likely the reason you’re using AMI is to avoid spending much capital up front until you make it big with some real money. So I’ve slanted this hypothesis with that end in mind.
Why would we do something so complicated?
MySQL replication is complicated. It’s error prone. It’s harder (in the long run) than it looks. We use it, and this entire method of managing MySQL on AMI’s because its what we have available to us at our budget. Are there better overall solutions? Without placing the limitations that I’m constrained to here: yes! But again. We’re workign solely inside the EC2 framework…
Why would we do something so susceptible to human error?
You’ve obviously never had someone place a hard copy in the wrong file folder. Or type reboot on the wrong machine. Or deleted the wrong file on your hard drive. If you think that operations (on a small scale) is any less error prone you’re fooling yourself! If you’re looking for speed and agility from your OPS team you have to trust them to do the best they can with the resources given. If you’re stuck *having* to use EC2 its likely because of budget and we’re looking at a circular set of circumstances. Make some good money and then hire a big ops team so that that can set in place a swamp of processes. The theory being the slower they have to move the moe they get a chance to notice something is wrong 🙂
What would you recommend to make this solution more robust if you were able to spend a *little* more money?

I would put one or each replication cluster instance on an actually owned machine. Just in case we’re looking at an act-of-god style catastrophe at amazon… You’ll still have your data. This costs A) a server per cluster, and the bandwidth to support replication.

And finally what problems will arise that I’m not yet aware of?

A couple that I haven’t touched, actually.

First MySQL replication requires that the server-id be a unique number for each instance in a cluster of computers. And each machine is running 2 instances of mysql (meaning two unique server ID’s per AMI.) The reason this is an issue is because every time you start your AMI instance the original my.cnf files will be there again, and without intervention all of your servers would end up having the same server ID, and replication would be so broken it will take you years to piece your data back together!
The easy way to circumvent this issue is to have a specific custom AMI build for each of your servers.

The elegant long-term solution is to devise a way, programatically (possibly using DNS, or even the Amazon SQS service) to obtain two unique server ID’s to use before running MySQL.
Second: without static IP addresses from the EC2 service your AMI’s will have a new IP every time the server boots.

this can be dealt with either manually or programatically (possibly via a DNS registration, and some scripts resetting MySQL permissions.)
Third: if, rather like a nursery rhyme which teaches children to deal with death by plague in medieval europe, “ashes ashes they all fall down” what do you do?
well hopefully they never “all fall down” because resetting a cluster from scratch is tedious work. But if they do you better hope that you took one of my two backup options seriously.

either you have a copy of a somewhat recent data-set in S3, or you have an offsite replication slave which can be used for just this circumstance..

or you’re back to square one…

There is a real issue to be discussed…

Amazons EC2 service is, by all accounts, brillian. But one of the things that it lacks is any sort of assurance regarding data permanence. What I mean is each machine that you turn on has 160GB of storage, but if that server instance is ever shut off the data is *lost* (not corrubted byt GONE) and the next time you start that server instance it is back to the base image… you cannot save data on EC2 between reboots.

This is not an issue for everyone. But for people looking to run services on EC2 which require some sort of permanent storage solution (as in databases like MySQL or PotsgreSQL) this is a big show stopper. I’ll be putting some real thought in the next few days on how to sidestep this pitfall. I’m sure that it can be done, and I even have some ideas on how to do it. But I want to think them through and do a little research before I go blurting them out and making (more of) an ass out of myself 🙂

So… More on this later.

CodeWord: Apokalyptik

The random things that spew forth from my brain…

Month: September 2006