The (theoretical) web services database

I’ve been kind of floating around this topic for a while… Well databases in general… And I see a lot of people who have rather high standards (which is not a bad thing.)  I imagine the complication of offering a service like this comes from the fact that database people have very stringent standards.

Things like ACID transactions, Foreign keys, Table/Row/Column/Field read/write locking, always come up in these types of conversations.  I suppose that this is so because it’s been the standard for so long… It’s just how people *think* about databases… Which means that its what databases should be, right? Right?

Well not long ago the people at Amazon rethought process communication, and rethought storage, and then rethought servers.  Perhaps its about time they rethought the database as well.  I have a hunch (as others have noted here before) that they already are!

I really think that a lot, and I mean a LOT, could be done with a very simple model.

  1. Tables are their own island (no foreign keys)
  2. simple auto Incrementing PK’s
  3. every column indexed
  4. only simple operators supported ( =, >, < , !=, is null, is not null )

Heresy! Ack! Foo! Bar! NO! THATS NOT A REAL DATABASE.  Well, no, not as you mean by “real database” but it certainly is a database.  And I expect it would be good enough for 85% of peoples wants, needs, and desires.

We’ve learned that delays in storage give us permanence.  We’ve learned that the pipeline is a good (and global) thing, and we’ve learned that impermanence gives us expandability.  Necessity being the mother of invention I expect that something like this will be out soon, and I expect that people will learn to be perfectly happy with it.  It’s all about flexibility and agility here people!
It’ll come, people will complain, it’ll work, and as time goes on, I think it’ll get better and better.

Distributed MySQL Via Web Services?

Imagine for a moment, if you will, making your MySQL queries via a REST API. Weird, huh? I’ll admit its a crazy idea, but then a lot of my ideas are crazy. Still. Work with me here.

Query –> || REST API ||

  1. The query is a select
  1. Rest API synchronously determines both which servers are up and which is the fastest to respond.
  2. The API connects to the server with your user name and password (specified in the request header)
  3. The query is run on that server, and the response is passed back through to you.
  4. Connection closed
  • The query is an Insert/Update/Delete
    1. Rest API synchronously determines both which servers are up and which is the fastest to respond.
    2. The API verifies your credentials against that server, and gives you a Query ID
    1. You can then re-query the API with the Query ID to determine if the query has been fully replicated.
  • The API writes the query into replication directories, a la slurpd
  • The query is then passed along to all of the real MySQL servers
  • Plenty of details to iron out here, but it’s certainly feasible… And definitely interesting…

    One Resource to Rule Them All!

    One resource to rule them all,

    One resource to find them,

    One resource to bring them all,

    And in the darkness bind them,

    In the land of server where the shadows lie.

    It’s been a bumpy road to peoples understanding of the EC2 service. And a large part of the problem is a point of view gap between the masses, and Amazon. It’s a lot like an American visiting India wondering why he cant order a steak (disclaimer: I don’t actually know whether you can order a steak in India, but the point essentially remains.) They have a different point of view in regards to the cow.

    So too does amazon have a different point of view on resources. Your average web guy sees a server as a single resource: “that server is very powerful it could do a LOT” or “thats an old server, not a lot can be done with it” Because for so long we was able to get X number of servers, those servers would be assigned roles, and thats what they were. A better server could crawl more pages, or store a larger database, or serve more page views. And of course this meant that the server was specific to the application. But this model gets more and more difficult to maintain as the project gets larger and larger. Anyone who’s gone from 15 to 85 servers knows this. And it boils down to one single point: Permanence does not scale.

    So the amazon guys decided to look at things differently. Your basic components of a server are Mhz, RAM, Bandwidth, and disk space. And they look at a server as a pool of those specific resources. You don’t have 15 good servers, you have 180,000 Mhz, and 120 Gb of ram, and 13,500 Gb of disk space.

    And since permanence doesn’t scale… permanence is built OUT. This is a difficult concept to grasp for most people, and building an application which doesn’t rely on permanence is difficult (myself included!) It’s a learning process, but a necessary one. Once people learn to put permanence in the right places — once we all figure out the tricks of the trade I am of the opinion that the web as a whole will become a much more stable place.

    There certainly will be some growing pains though. For example right now a huge pain the dependence on popular database products (MySQL, PostgreSQL) which are wonderful, don’t get me wrong, But they are, currently, limited to the realm of the server, instead of the realm of the cloud
    So lets all put our heads together and start thinking of ways in which we can make use of the cloud as a cloud. We can do this!

    ORDB gone…. Bummer….

    ORDB seems to have closed its doors. Thats huge — and sad. fare well ORDB. I wish I had more info about this one. If anyone has anything to add (or clarify on the subject) I would appreciate it being left in the comments. By the time I saw this their page had gone down, but I did find a reproduction on The Spam Diaries: ORDB Blocklist Gone.

    2006-12-18 11:34
    We regret to inform you that ORDB.org, at the ripe age of five and a half, is shutting down. It’s been a case of a long goodbye as very little work has gone into maintaining ORDB for a while. Our volunteer staff has been pre-occupied with other aspects of their lives. In addition, the general consensus within the team is that open relay RBLs are no longer the most effective way of preventing spam from entering your network as spammers have changed tactics in recent years, as have the anti-spam community.

    We encourage system owners to remove ORDB checks from their mailers immediately and start investigating alternative methods of spam filtering. We recommend a combination involving greylisting and content-based analysis (such as the dspam project, bmf or Spam Assassin).

    DNS and the mailing lists will vanish today, December 18, 2006.

    This website will vanish by December 31, 2006.

    Apparently this is old hat, and I’ve just found out, finally, due to the drying up of my DNS servers caches… Still. So long and thanks for all the fish!

    I couldn’t have put it better myself

    Yesterdays penny arcade hit it on the nose. Osx IS more convenient. It IS worth changing your opinions and telling everybody that all that Mac trash talking was about OS9, but OSX radically altered the very fabric of the universe and that you now have to take it all back: the Mac is now a truly reformed beast using its nefarious powers only for the purpose of good and justice (and the occasional profit margin)! Think Ghost Rider meets BSD :).

    All geeky references aside. I was once a mac hater. Now I’m a mac lover.  And you know what. I’m okay with that!

    MacFuse, no it’s not an Apple venture into the hardware aisle!

    But its details are here: The macfuse google code project page
    And it looks like a good thing indeed!  Being able to use the Linux fuse file systems on OSX will make for a wild ride, and really open a lot of doors

    Here are just a few Fuse based file systems:

    Needless to say, the number of things that have been (and can be) done via fuse makes its adoption on the Mac a very exciting idea.

    Storage3

    News: Aug 24, 2007: Michael T. provided a patch to fix some date issues he was having with amazon aws. I have not verified this yet, but seeing as I’m not precisely sure when I will be able to verify it I figured I would put his code up here for you to download if you’re experiencing authentication issues like he was! Get it here: Patched Storage3 Class

    Current Version: 1.0.1

    CHANGELOG

    • 1.0.0
      • Initial Public Release
    • 1.0.1
      • Added function fileExists($s3bucket, $s3file)
      • Added support for listing bucket files for buckets with over 1,000 files
      • Added contributed function setACL($s3bucket, $s3file, $shorthand=public-read)
      • Added support for setting an objects acl during the upload process
      • Added grabbing of response headers (which contain a LOT of userful information)
      • s3test.php includes an example of using headers to verify the integrity of a file stored in s3 via md5 hash

    This is a revised version of the file posted here: http://blog.apokalyptik.com/storage3.phps and includes:

    • A local, modified, version of the pear HTTP/Request package
    • A local copy of all other required pear packages
    • Documentation (under development… docs arent my strong suite)
    • An example application (s3test.php)

    Cheers!
    –Apokalyptik