You’re only ever done debugging for now.

I’m the kinda guy who owns up to my mistakes. I also strive to be the kinda guy who learns from them.  So I figured I would pass this on as some good advice from a guy who’s “screwed that pooch”

There was a project on which I was working, and that project sent me e-mail messages with possible problem alerts.  All was going well, and at some point I turned off those alerts.  I don’t remember when.  And I don’t remember why.  Which means I was probably “Cleaning up” the code.  It was, after all, running well (I guess.)  But along comes a bug introduced with new functionality (ironically a from somewhere WAAAAAAY up the process chain from my project).  And WHAM, errors up the wazzoo.  But no e-mails. Oops. Needless to say the cleanup process was long and tedious… especially for something that was avoidable.

I’ve since put the alerting code back into the application, and have my happy little helpers in place fixing the last of the resulting issues.

The lesson to be taken from this is that you’re only ever done debugging for now. Because tomorrow that code, thats working perfectly now, wont be working perfectly anymore.  And that the sources for entropy are, indeed, endless.

Whoa, talk about neglecting your weblog! Bad Form!

I know, I know, I’ve been silent for quite some time. Well Let me assure you that I’m quite all right! Are you less worried about me now? Oh good. (Yes I’m a cynical bastage sometimes.)

So life has, as it tends to do, come at me pretty fast. I’ve left my previous employer, Ookles, and I wish them all the best in accomplishing everything that they’ve been working towards. So I’ve Joined up with the very smart, very cool guys at Automattic. I have to tell you I’m excited to be working with these guys, they’re truly a great group.

I guess that means I’m… kind of… like… obligated to keep up on my blog now, eh?  I’m also kind of, like, ehausted. Jumping feet first into large projects has a tendency to do that to a guy though.  And truth be told I would have it any other way…

😀

Cheers

DK

CryoPID

Now this is cool: CryoPID a process freezer for linux.

“CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine.

CryoPID was spawned out of a discussion on the Software suspend mailing list about the complexities of suspending and resuming individual processes.

CryoPID consists of a program called freeze that captures the state of a running process and writes it into a file. The file is self-executing and self-extracting, so to resume a process, you simply run that file. See the table below for more details on what is supported.”

I find myself wondering: Could this be a new way of distributing interpreted language desktop apps as binary files without releasing the source?

The (theoretical) web services database

I’ve been kind of floating around this topic for a while… Well databases in general… And I see a lot of people who have rather high standards (which is not a bad thing.)  I imagine the complication of offering a service like this comes from the fact that database people have very stringent standards.

Things like ACID transactions, Foreign keys, Table/Row/Column/Field read/write locking, always come up in these types of conversations.  I suppose that this is so because it’s been the standard for so long… It’s just how people *think* about databases… Which means that its what databases should be, right? Right?

Well not long ago the people at Amazon rethought process communication, and rethought storage, and then rethought servers.  Perhaps its about time they rethought the database as well.  I have a hunch (as others have noted here before) that they already are!

I really think that a lot, and I mean a LOT, could be done with a very simple model.

  1. Tables are their own island (no foreign keys)
  2. simple auto Incrementing PK’s
  3. every column indexed
  4. only simple operators supported ( =, >, < , !=, is null, is not null )

Heresy! Ack! Foo! Bar! NO! THATS NOT A REAL DATABASE.  Well, no, not as you mean by “real database” but it certainly is a database.  And I expect it would be good enough for 85% of peoples wants, needs, and desires.

We’ve learned that delays in storage give us permanence.  We’ve learned that the pipeline is a good (and global) thing, and we’ve learned that impermanence gives us expandability.  Necessity being the mother of invention I expect that something like this will be out soon, and I expect that people will learn to be perfectly happy with it.  It’s all about flexibility and agility here people!
It’ll come, people will complain, it’ll work, and as time goes on, I think it’ll get better and better.

(PHP code) Gracefully handling the failure of TCP resources

function check_tcp_active($host, $port) {
    $socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
    socket_set_option($socket,
      SOL_SOCKET,
      SO_RCVTIMEO,
       array(
       "sec"=>0,
       "usec"=>500
       )
    );
    $result = @socket_connect($socket, $host, $port);
    if ( $result ) {
      socket_close($socket);
      return(TRUE);
    } else {
      return(FALSE);
    }
  }

  function find_active_server($array) {
    // Format: $array['127.0.0.1']=3306
    if ( is_array($array) ) {
      foreach ( $array as $host => $port ) {
        if ( $this->check_tcp_active($host, $port) ) {
          $rval['host']=$host;
          $rval['port']=$port;
          return($rval);
        }
      }
    }
    return(FALSE);
  }

  $mysqlServers=array(
    '127.0.0.1'    => 3306,
    '192.168.0.10' => 3306,
    '192.168.0.11' => 3306,
    '192.168.0.12' => 3306,
    '192.168.0.13' => 3306,
    '192.168.0.14' => 3306,
  );

  $goodMysqlHost=find_active_server($mysqlServers);

with only a very small amount of work a pseudo random load distribution would be possible. Hope this helps someone 🙂

Is compute as a service for me?

Note to Nick: I havent forgotten your request and I’ll have something on that soon, but when I started in I found that I had something else to say about compute-on-demand (or compute-as-a-service – terms which i use somewhat interchangably) So here it is. For all those people just jumping into a project or considering restructuring a project around these new trends I hope this helps. I would definately not consider this (or anything else I write) a GUIDE per se, but food for thought.

We live in an interesting world, now, because every new tech project has to ask itself at least one very introspective question: “is computing as a service the right thing for us?” And MAN is that ever a loaded question. At first blush the answer seems like a no brainer: “of course its for us! we don’t want to pay for what we don’t use!” Which is, at the basest level, true. But the devil is always in the details…

So which pair of glasses do you have to approach this problem with? What are the consequences of choosing wrong? How do we do it? Slow down. First you need to put some thought into these two questions: “what do we do?” and “how do we do it?” Because that is the foundation of which road holds the path to success and which to failure.

Are you a media sharing service which houses a billion images and gets thousands more every second? Are you a news aggregator which houses millions of feeds hundreds of millions of posts? Are you a stock tracking company which copes with continuous feeds of data for portions of the day? are you a sports reporting company who has five to twenty posts per day but hundreds of thousands of reads? Are you a modest blogger? Do you just like to tinker with projects?

As you can see all of those are very complex environments with unique needs, stresses, and usage spreads. And writing a blog entry which addresses whether each possible type of business should or shouldn’t use on demand computing would be impractical, not to mention impossible. But for the web industry there are a couple basic types of environments: “Sparse Write, Dense Read”, “Dense Write, Sparse Read”, with subtypes of “Data Dense” and “Data Sparse”

Environment: Sparse Write, Dense Read

For a lot of web applications you’re really not dealing with a lot of data. If you’re running a content management system or you’re a directory you have a finite amount of data which, in comparison with the number of times it’s read, is written to fairly infrequently. (In this case Infrequently written means that a databases query cache is a useful optimization for you.) It’s also very likely that you will be able to take a snapshot of your data in this type of environment in a fairly convenient manner. Compute as a service is probably right up your alley, and here’s why.

You are likely to have very normalized times during which your reads (or your writes) spike, meaning that you can actively plan for, setup, and use on demand resources to their fullest potential. Remember that an on demand resource is not an instant problem solver. In the case of something like Amazon EC2 it can take 5, 10, or 15 minutes for the server you’ve requested to even become active. After the server is up there has to be some process which gets all of the relevant data on it up to date. What this means is that you might be looking at 1/2 an hour before your 5 extra servers are ready to handle the 7:00 am to 9:00am traffic spike that everyone getting to the office in the morning generates. With your service, thats fine though. Just plan to turn the extra power on an hour early and turn it off half an hour after you expect the spike to be over. Wash rinse repeat.

Environment: Dense Write, Sparse Read

See this is the more complicated of the two environments. Everyone and their mother knows how to build a database driven application which gets a few writes and a lot of reads because thats what your common RDBMS are built for. Think of it as being idiot proofed out of the box 🙂 But when you have a backwards (as in usage, not as in technology) environment all of a sudden you have a lot of “conventional wisdom” which isn’t so wise anymore (what do you mean a faster write server than read servers causes replication problems?) (what do you mean my uber-normalization is the problem?).

It’s in this type of environment when we really have to look at the subsets of data, because the proof really lies in the pudding — so to speak.

Sub Environment: Data Sparse

You work with a relatively small window of data in realtime. You may or not get a request for all of the data you’re keeping continuously up to date, but you have to keep it that way or its your butt on the line, right? Well you’re probably in luck. I think it’s fairly likely that your data size is a relatively small one, for example you’re keeping a window with a period of 24 hours of data updated. Likely there is a *LOT* of history kept but thats kept elsewhere. Once you’re done with the data you shove it right out the backend into another process and it gets handled there (that backend is likely a sparse write sparse read environment which is extremely data dense — not for on demand computing (well maybe, but thats another blog post)).

For this environment compute as a service is probably going to be a godsend… if you can overcome one small, teentsy weentsy, ever so small yet still important detail: the development team. Now not all companies are going to have difficult development teams, but some do, and you simply cannot build an environment ripe for compute as a service without their cooperation, so be prepared whatever the case! You will likely be able to leverage hotcopy, or an LVM style live-action backup for insta-backups to your long term storage solution (or on-demand setup pool). You will likely be able to leverage the extra compute capacity for your peak load times. And everything will likely turn out OK. So long as you can get some of the crucial application details hammered out.

Sub Environment: Data Dense

I pity you. Compute as a service is probably not what you need. Cases may vary and, again, the devil is in the details. But you have a huge challenge ahead of you: Building an environment where a server can be programatically brought online and then caught up to date with the current compute pool in a time frame which makes even doing it a winning situation. This is something I’m going to put a lot of thought into… note to self… But unless you have some bright ideas here (and if you do, please send them my way) you have basically one chance: data partitioning. Get yourself a VERY good DBA, and really REALLY plan out your data. If you put enough thought into it in the beginning you have a chance to keep the individual pieces of data down to a small enough (and distributed enough) level which just might lend itself to compute as a service in a very LARGE way (but we’re really talking about going WAY beyond the 10 or 20 allowed Amazon EC2 server instances here)

Uh, Ok, Enough about these different environments… what do I need to do to USE on demand computing?

Well thats a difficult question to answer in a generally useful way. so without getting too specific:

You, first and foremost, need to have compute as a service thought about in every bit of your planning and executing stages. At every point in the set of long chains which make up your application you have to ask yourself “what happens if this goes away?” and plan for it.

A very close second is think pipes and hoses rather than chisel and stone. Each part of your environment should be as self contained as possible. When one hose springs a leak the solution is simple, replace the hose (and just bypass it for the mean time,) but when you loose a section of your monolithic structure things are a bit more complicated than that.

Finally you need to understand that you will have to work at taking full advantage of compute as a service. Remember that you are going to have to put TIME and ENERGY into using this kind of a service. Nothing comes free, and even in operations everything has an equal and opposite reaction. If you want to avoid spending the time and energy and money maintaining a hardware infrastructure you will have to put the same into avoiding one. But the benefits of doing so are real and tangible. Because when you’ve spent all of your time building an application which is fault tolerant rather than building an infrastructure which will fail you invariably provide to your user base a more robust and reliable service.