Random Musing: Bluring the Line Between Storage and Database?

As food for thought…

If you had a table `items`

  • itemId char(40),
  • itemName varchar(128),

Another table `tags`

  • tagId char(40),
  • tagName char(40),

And a third table `owners`

  • ownerId char(40),
  • ownerUsername char(40),
  • ownerPassword varchar(128),

It would theoretically be possible to have an S3 bucket ItemsToTags inside which you put empty objects named (ownerId)-(itemId)-(tagId). And a TagsToItems S3 bucket inside which you put empty objects named (ownerIf)-(tagId)-(itemId), it would then be possible to use the Listing Keys Hierarchically using Prefix and Delimiter method of accessing your S3 buckets to quickly determine what items belong to a tag for an owner, and what tags belong to an tag for an owner. You would be taking advantage of the fact that that There is no limit to the number of objects that one bucket can hold, and no impact on performance when using many buckets versus just a few buckets. You could reasonably store all of your objects in a single bucket, or organize them across several different buckets. (both the above links are to quotes taken directly from the S3 API docs provided by amazon themselves)

Using this method it would be possible, I think, to use the S3 datastore in a VERY cheap manner and avoid having to deal with the massive cost of maintaining these kinds of indexes in a RDBMS or on your own filesystems… Interesting. And since the data could be *anything* and you have, by default you have a many to many relationship here you could theoretically store *anything* and sort by tags…

Granted to find a tag related to multiple items you would have to make multiple requests, and weed out the diffs. but. if you’re only talking on the order of 2 or 3 tages per piece of data… it might just be feasible..

Now… Throw in an EC2 front end, and a SQS interface… interesting…

Makes me wonder what the cost and speed would be (if it would be an acceptable tradeoff for not having to maintain a massive database cluster)

Disclaimer: this is a random musing. I’m not advising that anybody actually do this…

How S3 Fits in Comparison to Other Storage Solutions

So, recently Nick G. Asked “Since you’ve worked with S3 a good bit, I’d like to get your take on using a service like S3 compared to using a local instance (or cluster) of MogileFS?”

I’d like to interject here and mention that in this case “quite a bit” means I’ve used it in one application for data backup, at an early stage during which there was no good example (much less released interface) for using S3 with PHP code. So I wrote and distributed my own. I’m sure that it’s fallen into disuse and more active projects are likely to be favored. So thats my “lots of experience”. Always take what I have to say with the appropriate amount of salt

Any my answer would be that each type of storage solution listed has both strengths and weaknesses, and determining which set best compliments your application needs will tell you where you should invest. I would also throw another option into the pot: the SAN. While a SAN might not be in the range of your average garage tinkerer it *is* in the range of medium or large startups with proper funding. I do however believe the question was geared more towards a slant on an “each of these versus S3” analysis, so thats how I’ll approach the question.

But first… Let me get this out of the way. S3 looses, by default, if you absolutely cannot live without block device access. Do not pass go, do not collect $200. It’s a weakness and you’ll have to be willing to accept it for what it is.

S3 Vs. SAN (Storage Area Network)

Your most tried and true contender in the mass storage market is probably the SAN. Those refrigerator sized boxes which sit in warehouse sized data-centers thirstily consuming vast amounts of electricity, and pushing the bits through slender delicate orange fiber-optic cables. Your basic sales pitch surrounding any SAN these days are comprised of the same points in varying degree:

Expandability: No modern SAN would be complete without the promise of expandable storage. On a quick and dirty level a SAN has a bunch of disks which it slices and dices into pieces and then glues those pieces together into a “logical unit”. So many many hard drives become just one hard drive. However keep in mind that you have to use a filesystem which supports dynamic expansion, and you almost always have to dismount the volume to accomplish it to boot.

Backup: At a small cost of “twice what you were planning to pay plus $30,000 for the software” you ought to be able to, with any modern SAN, preform realtime on-the-fly backups. I would throw in negative commentary here, but I think the sales pitch bares its negative connotations in a fairly self evident manner.

Clustering: You can have multiple machines accessing the same hard drive! Which is great as long as you can setup and use a clustering filesystem. What they fail to tell you is that using a *normal* non-cluster aware FS will get you nothing but massive data corruption. So unless you plan on using some cookie cutter type system for accessing the storage, and are planning on spending big bucks on having it built for you… the clustering is going to be less than useful. Also you cannot run multiple MySQL database instances on the same part of a shared disk like that, so get that idea out of your head too (disclaimer: I know not if allowing MySQL access to the raw partition fares any better in this case, but I somehow doubt it).

High availability/integrity: So long as you buy a bunch of extra hard drives for the machine you can expect to handle failures of individual disks gracefully. That is if the term gracefully includes running at 25% slower for a couple of hours while bits get shifted around… and then again when the broken drive is replaced… But, no, you wont loose your data

Speed: Yea… SAN’s are fricken fast… no doubt… SAN’s usually function on a dedicated fiber-optic network (afore mentioned delicate orange cables) so a) they don’t saturate your IP network, and B) aren’t limited to its speed

So how does S3 stack up against the SAN? Well, lets see… Expandability: S3 has a SAN beat hands down with not only implied expandability but also with implied constriction, S3 you pay for what you use.

Backup: Amazon guarantees data retention, no need to pay extra. Clustering: again, covered, providing that you have built your application to play nice in some way there is no problem here.

High Availability and Integrity: Here there is more of a tradeoff since a SAN is a guaranteed write and then immediately be available, and S3 is a write once, eventually stored. One of the hurdles with S3 is that it may take a while (an unknown period of time) for a file stored in S3 to become available globally, making it less than ideal to, say, host html generated by your CMS — thats not to say that its impossible, but there may be an indeterminate period when you have a page linked to and only half your viewers can access it (you would think you could get around this by storing the data first an then the index last, but there is no guarantee that the order in which items are sent is the order in which they will become available.)

And finally Speed: Here the SAN wins out — you pay for bandwidth to connect to amazon’s S3 service, and you cant, and wouldn’t want, to pay the bills for a sustained multi-gigabit per second connection to S3 (ouch)

Therefor: If you can handle A) a small time-to-availability, B) non-block-access, and C) a speed limited by the public internet connection. Then S3 is probably a better choice. But for the total package… if you have the resources… the SAN is irreplaceable.

S3 Vs. NAS (Network Attached Storage)

The NAS is like the SAN’s little brother. They tend to function much the same as a SAN, but are usually A) put on the IP network which can cause saturation and limits speed, B) are usually not as robust in the H.A. and Data integrity department, C) have a lower cap on their ultimate expandability, and D) cost a whole hell of a lot less than a SAN.

So the NAS has carved out a well deserved niche in small business and some home offices because it provides a bunch of local storage at a much more reasonable price. We, therefor, cannot evaluate its pros and cons on the same points as we did the SAN. NAS are often used to store large files locally in a shared manner. Many clients mount the shared volume and are able to work collaboratively on the files stored there. And for this reason S3 is not even thinking about encroaching on the NAS space. First off a home DSL working on a 100MB CAD file is not feasible in the same way that it is on a NAS. It would be an awful thing to wait for 100MB to save at 12Kb/sec – Period. Also the idea of using a multi-user accounting software to have two accountants in the records at the same time is basically impossible…

If you’re thinking about the NAS in a data-center type environment, I’m going to consider it lumped in with either the homegrown cluster solution (small NAS) or the SAN (large NAS)

So if you need a NAS… stick with a NAS. HOWEVER consider S3 as a very convenient, effective, and affordable alternative to something like tape based backup solutions for this data.

S3 Vs. HomeGrown Cluster Storage

The home grown clustering solution is an interesting one to tackle. NFS servers, or distributed filesystems (with or without local caching), or samba servers, or netware servers, all with-or-without some sort of redundancy built in, and all with varying levels of support attached. And thats your biggest challenge in this space: finding support.

You will have to build your application to take into account the eccentricities of the particular storage medium (varying level of POSIX support, for example) but knowing what those quirks *are* will save you time frustration and gobs of money later on. Because if you’re using some random duct-taped-solution thats been all Mac’d out it will probably do the trick — but what happens if the guy who designed it (and thus knows how all the pieces fit together) leaves the company or gets hit by a bus? well… you’re probably out of luck. But with S3 you have a very large pool of people all rallying around one solution with one (ok or two) access methods and it simply is what it is.

There are really no surprises with S3 which is the first reason that it beats out the custom tricked out storage solution. The second reason is that there is no assembly required — except maybe figuring out which access library to use. No assembly means no administration. No administration means better code. Better code means getting sold like a hot video sharing company. Well… One can dream

S3 Vs. Local Storage

Aside from the obvious block access, and up-to-scsi speeds that local storage provides it looses to S3 in almost every way.

It’s not expandable very far. It’s not very fail-safe. It’s not distributed. It requires some form of backup. It requires, power, cooling, room, and physical administration. My advice: if you can skip the hard drive you SHOULD.

S3 Vs. MogileFs

MogileFS is an interesting comer in this particular exercise-of-thought. It’s a kind of hybrid between the grow-your-own cluster and the local storage bit. It offers an intriguing combination of pro’s vs con’s, and is probably the most apples-to-apples comparison that can be made with S3 at all. Which makes me wish I’d had more of a chance to use it.

But the basic premise is that you have a distributed system, which is easily expandable, and handles data redundancy. My understanding is that you classify certain portions of the storage with a certain number of redundant copies required to be considered safe, and the data is stored on that many different nodes. When you make a request for the file you are returned a link in such a way as is meant to distribute the read load among the various servers housing the data. You also have a built in fail-safe for a node going down and shouldn’t be handed a link to a file on a downed node.

So what does all that mean? Well if you went about trying to build yourself a non-authenticated in-house version of Amazon’s S3 service you would probably end up with something that is remarkably similar to MogileFS. I wouldn’t even be surprised to find out that S3 is modeled after Mogile. What’s more Mogile has a proven track record when it comes to serving files for web based applications.

So how do they actually compare? I would say that for a company deciding whether or not to use Mogile Versus S3 it comes down to a couple of key factors. A) source and destination of traffic, B) type of files being distributed, and C) up front investment.

As far as your traffic. If you’re planning on using Mogile primarily internally and data will rarely leave the LAN then you will not be paying for bandwidth costs associated with S3. That makes for a pretty simple solution. If you are distributing the files to a global audience, however, you might find that using S3 to pay for bandwidth costs along with handling local availability, delivery speed, and high availability is a win. However I’d be fairly inclined to guarantee (as I’ve covered before) that the raw bandwidth purchased from your ISP is a lot cheaper than from Amazon AWS, so long as you already have all the necessary equipment in place for redundancy, delivery, etc, Mogiles advantages brings it within striking distance of S3.

If you are distributing primarily small files (images, etc) then mogile is not going to present to you any challenges. If, however, you are serving 100MB video files or 650MB CD images Mogile might actually work against you. When I tried to use Mog for this kind of an application there was a limit on the size of an individual file that it was willing to transfer between hosts. In this respect Mog broke its own replication. DISCLAIMER: I only spent a week or so total with Mog (broken up into hour here and hour there sessions) this might have been a) known, or b) easily worked around, but my quick googling at the time yielded little help. The idea of having to split large files was a deal breaker at the time and other things were pressing for my attention.

And the real thing that Mog does require which S3 does not is a hardware and manpower investment. Since you’re going to have to work your application in a similar manner to house data in either S3 or in MogileFS, S3 wins out on sheer ease of setup… All you have to do is signup for an AWS account, pop in a credit card number, and you’re on your way. That same hour. You also don’t run out of space with S3 like you can with Mog, granted Mog can be easily expanded — but you have to put more hardware into it. S3 is simply already as large as you need it to be.

Summary

In the end what these choices always come down to is some combination of the classic triangle: Time Vs. Money Vs. Manpower. And what storage is right for you depends on how much of each you are willing to commit. Something always has to give. The main advantage of S3 is tat you’re borrowing on the fact that Amazon has already committed a lot of time and hardware resources which you can leverage if the shoe fits.

More than likely what you’ll find is that the “fit” os something like S3 will be a seasonal thing. When you start out developing your application and you don’t have resources to throw at it using S3 for your storage will make a lot of sense because you can avoid the whole issue of capacity planning and purchasing hardware with storage in mind. Then you will probably move into a quasi-funded mode where it is starting to, or outright gets, too expensive to use S3 versus hiring a an admin and throwing a couple of servers in a data-center. And then you might just come back full circle to a point when you’re drowning in physical administration, and spending a little extra for ease of use and peace of mind comes back into style.

So which is right for you? Probably all of the above, just at different times, for different uses, and for different reasons. The key to your success will likely lie in your ability to plan for when and where each type of storage is right. And to already have a path in mind for when it’s time to migrate.

Where should AmazonAWS go next?

We have SQS, we have S3, and we have EC2, so what next from the Amazon AWS team?.

There is really only one piece of the puzzle missing… And its a piece that has a lot of people griping. I have a strong hunch that Amazon is working on the problem, because I have a strong hunch that it is (or was) one of their major hurdles. And that problem is the database service.

How do you provide an easy to use interface to relational lookup-able storage? How do you make it universal? How do you make it secure? How do you make it FAST?

The first 3 questions are all answerable in roughly the same way: Make it a service, and let the service handle the interface, security, and universality. They’ve sucessfully applied the web-service to messaging, storage, and cpu power, theres no reason that this wouldnt be the final piece to the jigsaw puzzle. The last question carries with it the greatest problem, though. Allowing people to store data and run queries without the innevitable tanking of the server process would be a challange, to say the least (artificial intelligence is no match for human stupidity, after all).

But thats besides the point. If you break down into two components: anchors and tags — that is something is data or something is data about the data. provide a schema that works without collision problems, and – more importantly works both ways (finding tags related to an anchor, AND finding anchors relating to a tag) you cover probably 90% of peoples needs in one fell swoop.

I’ve been thinking a lot about how to do this, lately, as I’ve been drowning in a sea of data myself which is easy to manage in one direction but difficult in the other while keeping the size of the whole thing down.

Not only would that provide Amazon with the ability to have its finger in basically every new technological cookie jar BUT would provide huge massive gigantic enormous amounts of datas on what people really think about things. It would be an exceptional win for amazon, I think, and could indeed be leveraged to a huge advantage in the marketplace market. Because, as netflix has shown us recently, reliably finding things which relate to other things is *big* business.

Is compute as a service for me?

Note to Nick: I havent forgotten your request and I’ll have something on that soon, but when I started in I found that I had something else to say about compute-on-demand (or compute-as-a-service – terms which i use somewhat interchangably) So here it is. For all those people just jumping into a project or considering restructuring a project around these new trends I hope this helps. I would definately not consider this (or anything else I write) a GUIDE per se, but food for thought.

We live in an interesting world, now, because every new tech project has to ask itself at least one very introspective question: “is computing as a service the right thing for us?” And MAN is that ever a loaded question. At first blush the answer seems like a no brainer: “of course its for us! we don’t want to pay for what we don’t use!” Which is, at the basest level, true. But the devil is always in the details…

So which pair of glasses do you have to approach this problem with? What are the consequences of choosing wrong? How do we do it? Slow down. First you need to put some thought into these two questions: “what do we do?” and “how do we do it?” Because that is the foundation of which road holds the path to success and which to failure.

Are you a media sharing service which houses a billion images and gets thousands more every second? Are you a news aggregator which houses millions of feeds hundreds of millions of posts? Are you a stock tracking company which copes with continuous feeds of data for portions of the day? are you a sports reporting company who has five to twenty posts per day but hundreds of thousands of reads? Are you a modest blogger? Do you just like to tinker with projects?

As you can see all of those are very complex environments with unique needs, stresses, and usage spreads. And writing a blog entry which addresses whether each possible type of business should or shouldn’t use on demand computing would be impractical, not to mention impossible. But for the web industry there are a couple basic types of environments: “Sparse Write, Dense Read”, “Dense Write, Sparse Read”, with subtypes of “Data Dense” and “Data Sparse”

Environment: Sparse Write, Dense Read

For a lot of web applications you’re really not dealing with a lot of data. If you’re running a content management system or you’re a directory you have a finite amount of data which, in comparison with the number of times it’s read, is written to fairly infrequently. (In this case Infrequently written means that a databases query cache is a useful optimization for you.) It’s also very likely that you will be able to take a snapshot of your data in this type of environment in a fairly convenient manner. Compute as a service is probably right up your alley, and here’s why.

You are likely to have very normalized times during which your reads (or your writes) spike, meaning that you can actively plan for, setup, and use on demand resources to their fullest potential. Remember that an on demand resource is not an instant problem solver. In the case of something like Amazon EC2 it can take 5, 10, or 15 minutes for the server you’ve requested to even become active. After the server is up there has to be some process which gets all of the relevant data on it up to date. What this means is that you might be looking at 1/2 an hour before your 5 extra servers are ready to handle the 7:00 am to 9:00am traffic spike that everyone getting to the office in the morning generates. With your service, thats fine though. Just plan to turn the extra power on an hour early and turn it off half an hour after you expect the spike to be over. Wash rinse repeat.

Environment: Dense Write, Sparse Read

See this is the more complicated of the two environments. Everyone and their mother knows how to build a database driven application which gets a few writes and a lot of reads because thats what your common RDBMS are built for. Think of it as being idiot proofed out of the box 🙂 But when you have a backwards (as in usage, not as in technology) environment all of a sudden you have a lot of “conventional wisdom” which isn’t so wise anymore (what do you mean a faster write server than read servers causes replication problems?) (what do you mean my uber-normalization is the problem?).

It’s in this type of environment when we really have to look at the subsets of data, because the proof really lies in the pudding — so to speak.

Sub Environment: Data Sparse

You work with a relatively small window of data in realtime. You may or not get a request for all of the data you’re keeping continuously up to date, but you have to keep it that way or its your butt on the line, right? Well you’re probably in luck. I think it’s fairly likely that your data size is a relatively small one, for example you’re keeping a window with a period of 24 hours of data updated. Likely there is a *LOT* of history kept but thats kept elsewhere. Once you’re done with the data you shove it right out the backend into another process and it gets handled there (that backend is likely a sparse write sparse read environment which is extremely data dense — not for on demand computing (well maybe, but thats another blog post)).

For this environment compute as a service is probably going to be a godsend… if you can overcome one small, teentsy weentsy, ever so small yet still important detail: the development team. Now not all companies are going to have difficult development teams, but some do, and you simply cannot build an environment ripe for compute as a service without their cooperation, so be prepared whatever the case! You will likely be able to leverage hotcopy, or an LVM style live-action backup for insta-backups to your long term storage solution (or on-demand setup pool). You will likely be able to leverage the extra compute capacity for your peak load times. And everything will likely turn out OK. So long as you can get some of the crucial application details hammered out.

Sub Environment: Data Dense

I pity you. Compute as a service is probably not what you need. Cases may vary and, again, the devil is in the details. But you have a huge challenge ahead of you: Building an environment where a server can be programatically brought online and then caught up to date with the current compute pool in a time frame which makes even doing it a winning situation. This is something I’m going to put a lot of thought into… note to self… But unless you have some bright ideas here (and if you do, please send them my way) you have basically one chance: data partitioning. Get yourself a VERY good DBA, and really REALLY plan out your data. If you put enough thought into it in the beginning you have a chance to keep the individual pieces of data down to a small enough (and distributed enough) level which just might lend itself to compute as a service in a very LARGE way (but we’re really talking about going WAY beyond the 10 or 20 allowed Amazon EC2 server instances here)

Uh, Ok, Enough about these different environments… what do I need to do to USE on demand computing?

Well thats a difficult question to answer in a generally useful way. so without getting too specific:

You, first and foremost, need to have compute as a service thought about in every bit of your planning and executing stages. At every point in the set of long chains which make up your application you have to ask yourself “what happens if this goes away?” and plan for it.

A very close second is think pipes and hoses rather than chisel and stone. Each part of your environment should be as self contained as possible. When one hose springs a leak the solution is simple, replace the hose (and just bypass it for the mean time,) but when you loose a section of your monolithic structure things are a bit more complicated than that.

Finally you need to understand that you will have to work at taking full advantage of compute as a service. Remember that you are going to have to put TIME and ENERGY into using this kind of a service. Nothing comes free, and even in operations everything has an equal and opposite reaction. If you want to avoid spending the time and energy and money maintaining a hardware infrastructure you will have to put the same into avoiding one. But the benefits of doing so are real and tangible. Because when you’ve spent all of your time building an application which is fault tolerant rather than building an infrastructure which will fail you invariably provide to your user base a more robust and reliable service.

Toying with the idea of a podcast

I cant say that I like hearing my own voice… but maybe with some appropriate filters I can sound like the bad guy in a poorly made kidnapping film, and that would suite me. I really get going once I start physically talking (just ask my wife) so it might be a better medium for me thantext delivery.

On the other hand. I dont have a commute at present. And I dont have an IPOD. And when I’m in front of my PC I’m usually concentrating very intensly… So I never listen to anyone elses podcasts. Which would make my producing one a possibly sweet (but more likely fairly bitter) irony.

But, what the hell, it’s worth a shot right? What are your thoughts?

Taking Requests

It’s the end of a long, hard, day and I’m thinking about writing a blog entry. But… I usually only write when something is happening somewhere which gets me sparked… Because I’m at a loss when trying to think about what would be interesting or not to other people. So I figure I’ll take requests. I have maybe a reader base of 10 one-time readers, and 2 full time readers (unless I dont count then its probably just 1… :D) but surely you guys must have some topics that you’d like some input on? So think of this as a text-based radio station and it’s the all-request lunch hour.

Or, if no one responds, I’ll have to just wait till the muse strikes…

An 90 second introduction to multidimensional arrays in PHP

If a word is a variable, then a sentence is an array. A paragraph is an array of sentences, a chapter is an array of paragraphs, a book is an array of chapters, and a library is an array of books. This would look, in php, like this:

  $library = array (
    'book1' => array (
      'chapter1' => array (
        'sentence1' => array (
          'word1' = "In",
          'word2' = "the",
          'word3' = "beginning",
        ),
      ),
    ),
  ); 

Therefor $library[‘book1’][‘chapter1’][‘sentence1’][‘word2’] is “the”. And $library[‘book1’][‘chapter1’][‘sentence1’] is equal to array ( ‘word1’ => “In”, ‘word2’ => “the”, ‘word3’ => “beginning”, );

And thats an array. Thus closes our discussion on arrays in PHP … huh? whats that? oh… you need more? Well sure there are a zillion uses for arrays, and learning to think in arrays often takes running into a situation where using anything else becomes less than viable. But for the sake of argument lets pretend we’re keeping simple track of deposits, withdrawals, and a balance. In this app every transaction invariably has a couple of pieces of information: Transaction date, Second Party, Amount, And a type (deposit or withdrawal).

array (
  'date'   => $$,
  'type'   => $$,
  'party'  => $$,
  'amount' => $$,
)

Our balance sheet is simply an array of those arrays

$sheet=array (
  '0' => array (
    'date' => 'monday',
    'type' => 'd',
    'party' => 'employer',
    'amount' => 1234.56,
  );
  '1' => array (
    'date' => 'tuesday',
    'type' => 'w',
    'party' => 'rent',
    'amount' => 500,
  );
  '2' => array (
    'date' => 'wednesday',
    'type' => 'w',
    'party' => 'computer store',
    'amount' => 712.59,
  );
);

This, while fictitious, should give a good example of how a multidimensional array works. We can get a balance with a very simple loop using php’s foreach() conrol structure.

$balance=0;
foreach ( $sheet as $transaction_id => $details ) {
  switch ( $details['type'] ) {
    case 'w':
      $balance=$balance - $details['amount'];
      break;
    case 'd':
      $balance=$balance + $details['amount'];
      break;
  }
  echo "[{$details['type']}]\t "
        ."{$details['party']}\t "
        ."Amount: {$details['amount']}\t "
        ."Balance: {$balance}
";
}

That is basically everything you need to know to start (of COURSE there’s more to learn) working with multidimensional arrays, except for one thing. When you’re faced with working with somebody else’s data structures you will need to get information about how they are laying out there arrays. The slow painful way of doing this is examining the code. The quick happy way us to use either var_dump() or print_r(). I prefer print_r for most jobs just remember to wrap the output of print_r in <pre></pre> tags if you’re doing this debugging in a browser… trust me. it’ll help a lot.

Rules of thumb for high availability systems (Infrastructure)


Never be at more than ½ capacity

If you’re planning a truly highly available system then you have to be aware that a serious percentage of your hardware can be forcefully cut from your organizations torso at any moment. You are also not exempt from this rule on holidays, weekends, or vacations. Loosing power equipment, Loosing networking gear, the help tripping over cables, acts of God. If you aren’t prepared to have a random half of your organizations hardware disconnected at any moment then you aren’t H.A. yet.


If you don’t have 2 spares then you arent yet ready

Murphy was an optimist. If you’ve ever replaced a dying (or dead) hard drive with a new hard drive which… doesn’t work. Or ram, or a CPU. Then you haven’t been in ops long enough. Sometimes your backup plan needs a backup plan. And you have to have it. Theres no excuse for being off line, so you need not only one but two (or more) possible replacements for a point of failure.


Disaster Recovery is an ongoing process

The tricky thing about Highly Available systems is that you have to be working… while you’re recovering. Any time you’re planning your HA setup, and you work around a point of failure, stop and think a moment on what it will take to replace that failed point. If it required bringing things down again… thats no good.


Growth planning should always be done in exponents

Never again are you to talk (or think) of doubling growth. You shall from this point forward think in squares, and cubes, and the like. In the age of information you tend to gather data at an alarming rate, don’t let it overtake you!


If you depend on a backup, it’s not HA

“Whats that? The primary server is off line? Do we have a spare? No, but we have a backup. How long? Oh… 36 hours… What? No, I can’t speed it up.” Lets face it if you’re restoring your live system from backup you’ve screwed the pooch. Backup is NOT high availability but it is good practice, and when it comes down to it 36 hours is marginally better than never.


Self healing requires more thought than you’ve given it

The simple fact of life in the data center is that all services are an interlocking tapestry. And if the threads break down the tassels fall off. Self healing is not only about detection and removal its also about rerouting data. If the database server that you normally write to has gone down, you can detect it, but can you instantly rewire the 8 different internal services which feed into the database to write to a different server? And then back again?


DNS is harder than you think, and it matters more than ever

The one infrastructure that people rely on most, and know the least about, is DNS. Dns might as well be a piece of hardware, because if your users cant type in www.blah.com to get to you, theres absolutely zero chance they’ll have your IP address handy. Worse yet, DNS is the number one thing that I see administrators screw up all the time. Talking zone files with (sometimes veteran) administrators is like talking in Klingon to a 2 year old. It usually doesn’t work too well.


Rules of thumb for high availability systems (Databases)


Replicating data takes longer than you think

In this brave new world of terrabytes per week theres a nasty truth. Replicating that much data across a large number of nodes is a headache. And it’s usually not as fast as you want it to be. Instantaneous replication is nice, but generally speaking you’re writing to one server and reading from X number of others. Your read servers, therefor, not only bar the same load as the write server (having to replicate everything that goes into the write server) but has to bear the additional load of supporting the read requests. A frequent mistake that admins make is putting the best hardware into the write server, and using lesser machines for the read servers. But if you’re truly processing large amounts of data this create a dangerous situation where if a read server stops for a while it might take days or weeks to catch up. Bad juju.


Less is more, and then more is more, and then less is more again

In the beginning you had data optomization. Everything pointed to something, and your masterfully crafted database schema duplicated absolutely no piece of information. And then you increased your size and volume to the point that this approach became too cumbersome to sustain your access time. You moved over to a new schema where you could select all the data you need in one statement, but data is duplicated everywhere. And finally this monolithic approach has locked you into multi-million dollar pieces of hardware, so you need to re-normalize your data so that it can be partitioned onto multiple clusters. Expect this, Plan for it, and be prepared for the hard truth: this is a truly painful process!


Spend the money here, if nowhere else

If you deal in information, you absolutely have to spend real money here. This is not the place to skimp. If you do… you’ll be sorry.


Rules of thumb for high availability systems (Employees and Departments)


False positives breed contempt

If you routinely get SMS alerts for no reason at 3:00am when you’re sound asleep. And it always ends up being a false alarm. There will come a time when you just opt to ignore the pager. And this time not only will wolf have been cried, the flock is truly under attack. Always always work to reduce false positives, and set reasonable alerting thresholds. Is something an emergency worth getting up for at 3:00am? Or isn’t it? Sure a web server went down, and was removed. But there are 13 others all functioning. You can sleep. But if you lost half of them… somethings probably up!


No department is an island

Contrary to popular belief, it takes more than the ops department to design a truly HA system. For example, your admins aren’t allowed to just start monkeying with the database schema when they feel like it. Sure its more highly available now, but the application cant use it any more. Just as no man is an island, neither is the ops department. You can work with them (good) or you can work against them (bad) but choose wisely.


If operations warns that the sky is going to fall, take them seriously

Lets face it. If your auto mechanic says your alternator will die very soon – you replace it. If your inspector says you’ve got the beginnings of a termite problem – you adddress it. If your weatherman tells you it might rain today – you grab your umbrella on your way out the door. And when your ops team comes into your office telling you that you have exactly 90 days until your database server becomes a very heavy very hot very expensive paper weight – why would you ignore that? Usually when ops says the sky is about to fall it’s because they were up in the clouds fixing the slightly off color shade of silver you were complaining about and saw the cracks forming themselves. Ignore them at your own risk, but don’t say they didn’t warn you.


If you don’t spend the money on ops, nothing runs.

Without your engine your car doesnt run. Without your heart you die. And without giving the ops necessary resources department the application that you’ve invested so heavily in will not run. Because there will be nothing to run it on. Or worse yet: it’ll run but something will break every 3rd day. You cannot skimp here. Well you can, but you don’t get high availability as well as a low price tag. It’s a pain in the ass… but when you bought the Saturn you had no right to expect Nascar results.

The RDBMS Misconception That Less is More

It’s commonly held that normalization is a good thing. And it is. But like all good, or more to the point TRUE, things there are circumstances in which the opposite hold true.

The “proper” way to layour a database schema is something as ever changing as the tides. Rather like the US justice system we find that things which once held true no longer do. Or that things which were once absolute do, actually, have extenuating circumstances under which they arent — exactly — absolute.

The proper way to lay out an RDBMS system is to look at a very simple ratio: Space VS Speed. The less duplication of data in your database the more efficient (in terms of space disk used) it is. In exchange for that dis space savings you incur the cost of additional disk seeks.

For example, if you’re keeping track of your users information (e.g. who’s registered and who hasnt) You might use a table like this:

Users: |  userId | firstName | lastName | eMail | cryptPasswd |

But in all likelyhood you’re going to have a lot of users with a common first and last name! Normalization to the rescue (or so it seems — at first)

Users: | userId | firstNameId | lastNameId | eMail | cryptPasswd |
FirstNames: | firstNameId | firstName |
LastNames: | lastNameId | lastName |

Now, instead of storing the string “John” a thousand times for the thousand users with the first name of John, you store the string once, and you have an integer field which related (the R in RDBMS) to a normallized list of names.

But… the cost is that now any time you want to pull a name from the table it requires 3 lookups.

select firstNameId,lastNameId from Users where userId = 1
select firstName from FirstNames where firstNameId=x
select lastName from LastNames where lastNameId=y

Where the same would have been done with the following query before normalization

select firstName, lastName from Users where userId=1

It gets worse when you’re computng values based on information stored in your tables. For example if you are looking for the number of times a user has visited a certain page, so that you can show them the information on the page they are viewing (or perhaps to do some checking on that value each time they visit to prevent, for example, site mirroring). You might already be storing what people are doing on the site in a table called UserActionLog for debugging, tracking, or statistical purposes. And you use the data in that table to run reports on a, say, weekly basis.

You COULD use something like this to gather the information about the user each time they visit a page:

select count(pageId) from UserActionLog where userId=x and pageId=y

But you will probbaly find that duplicating this data is a much more cpu effective, though disc inefficient, way of solving the problem. Storing something like this in a new table would yeild a much faster result for something which will be accessed continuously

PageVisitsByUser: | pageId | userId | totalVisits | lastVisit |

Now is this always going to hold true? Well no. The places you’ll find where it doesnt matter are the places in which you have WAY more resources than your needs require. For example you only have 100 users, and get hits on pages which require database access rarely. Applications like this dont need optomization because the advancing state of computing hardware *IS* the optomization that they need.

However as you process more and more volume you’ll find time and time again that a 1/1000 second per hit advantage is an 11.5 DAY (1,000,000 seconds) savings for 1 billion hits… even with only a million hits thats a 16 minute per day savings. You can see how the savings stacks up when you start adding in powers of 10

Thats the real challenge of the Web2.0 movement. Finding the amount of data versus the need to use that data which hits the sweet spot. What can we do with what we’ve got that people want?. I’d argue that as warfare in the 20th century was defined by gunpoweder, Web2.0 is a battle defined by its data schema

Myth: Linux doesnt need updates out of the box

I’ve just installed a fresh (from the dvd) Fedora Core 5 install. I checked all packages available to me in the installer (except the languages, because I’m monolingual) and “$ yum update” is now downloading 389 updates (thats almost 1GB)

So while I still think that the *nix OS’s are *WAY* better than the MS OS’s… The idea that linux doesnt need as many security updates out of the box as windows, is clearly a myth.

Unless: You installed the Linux release as soon as it came out (I.E. during the initial mirroring process), *OR* you built your OS from scratch. Even then over the course of your installed lifetime you’ll be applying a *LOT* of patches (or upgrades if you wish)

As a side note: The number of securioty updates being low would be, in my mind, a bad thing. You *WANT* your OS people to be consious of the fact that there are other people smarter than they are 🙂

DK