This is a very interesting idea indeed

so and so over ad base4.net wrote about going “beyond stateless” by using method-less objects, and I found it interesting. This thing that intrigued me after reading the article was something completely different: the idea of using the client to store their own data.

I’ve often thought to myself that a form pf public key encryption should be used for web authentication… removing the hassle of the user name and password altogether… But why not take it a step further, and use it for encrypting the data? You could then have the client store the data for you and transmit it back over the wire when necessary.

I’m not talking about anything like Flickr not saving images on their servers here, I’m talking about things like contact information, notification settings, online social relationships, and preferences. Obviously not all data would be storable in this format, but the biggies could be name, social, email, credit card numbers (preferably with different keys so that you were able to delegate access on a per detail basis: None, Name, Contact info, Payment Processing, etc.

All it would take is a very lightweight fast client store (a la OpenLDAP which reads faster than it writes) and reversible encryption.

Now this would be a disclaimer: “We value your privacy, and therefor do not keep any of your personal information, preferences, history, or other records on our servers. That data is stored on your computer in a 2048 bit encrypted form. Therefore if a hacker were to penetrate our servers they would find absolutely no information which could be used against you”

The (theoretical) web services database

I’ve been kind of floating around this topic for a while… Well databases in general… And I see a lot of people who have rather high standards (which is not a bad thing.)  I imagine the complication of offering a service like this comes from the fact that database people have very stringent standards.

Things like ACID transactions, Foreign keys, Table/Row/Column/Field read/write locking, always come up in these types of conversations.  I suppose that this is so because it’s been the standard for so long… It’s just how people *think* about databases… Which means that its what databases should be, right? Right?

Well not long ago the people at Amazon rethought process communication, and rethought storage, and then rethought servers.  Perhaps its about time they rethought the database as well.  I have a hunch (as others have noted here before) that they already are!

I really think that a lot, and I mean a LOT, could be done with a very simple model.

  1. Tables are their own island (no foreign keys)
  2. simple auto Incrementing PK’s
  3. every column indexed
  4. only simple operators supported ( =, >, < , !=, is null, is not null )

Heresy! Ack! Foo! Bar! NO! THATS NOT A REAL DATABASE.  Well, no, not as you mean by “real database” but it certainly is a database.  And I expect it would be good enough for 85% of peoples wants, needs, and desires.

We’ve learned that delays in storage give us permanence.  We’ve learned that the pipeline is a good (and global) thing, and we’ve learned that impermanence gives us expandability.  Necessity being the mother of invention I expect that something like this will be out soon, and I expect that people will learn to be perfectly happy with it.  It’s all about flexibility and agility here people!
It’ll come, people will complain, it’ll work, and as time goes on, I think it’ll get better and better.

Distributed MySQL Via Web Services?

Imagine for a moment, if you will, making your MySQL queries via a REST API. Weird, huh? I’ll admit its a crazy idea, but then a lot of my ideas are crazy. Still. Work with me here.

Query –> || REST API ||

  1. The query is a select
  1. Rest API synchronously determines both which servers are up and which is the fastest to respond.
  2. The API connects to the server with your user name and password (specified in the request header)
  3. The query is run on that server, and the response is passed back through to you.
  4. Connection closed
  • The query is an Insert/Update/Delete
    1. Rest API synchronously determines both which servers are up and which is the fastest to respond.
    2. The API verifies your credentials against that server, and gives you a Query ID
    1. You can then re-query the API with the Query ID to determine if the query has been fully replicated.
  • The API writes the query into replication directories, a la slurpd
  • The query is then passed along to all of the real MySQL servers
  • Plenty of details to iron out here, but it’s certainly feasible… And definitely interesting…

    One Resource to Rule Them All!

    One resource to rule them all,

    One resource to find them,

    One resource to bring them all,

    And in the darkness bind them,

    In the land of server where the shadows lie.

    It’s been a bumpy road to peoples understanding of the EC2 service. And a large part of the problem is a point of view gap between the masses, and Amazon. It’s a lot like an American visiting India wondering why he cant order a steak (disclaimer: I don’t actually know whether you can order a steak in India, but the point essentially remains.) They have a different point of view in regards to the cow.

    So too does amazon have a different point of view on resources. Your average web guy sees a server as a single resource: “that server is very powerful it could do a LOT” or “thats an old server, not a lot can be done with it” Because for so long we was able to get X number of servers, those servers would be assigned roles, and thats what they were. A better server could crawl more pages, or store a larger database, or serve more page views. And of course this meant that the server was specific to the application. But this model gets more and more difficult to maintain as the project gets larger and larger. Anyone who’s gone from 15 to 85 servers knows this. And it boils down to one single point: Permanence does not scale.

    So the amazon guys decided to look at things differently. Your basic components of a server are Mhz, RAM, Bandwidth, and disk space. And they look at a server as a pool of those specific resources. You don’t have 15 good servers, you have 180,000 Mhz, and 120 Gb of ram, and 13,500 Gb of disk space.

    And since permanence doesn’t scale… permanence is built OUT. This is a difficult concept to grasp for most people, and building an application which doesn’t rely on permanence is difficult (myself included!) It’s a learning process, but a necessary one. Once people learn to put permanence in the right places — once we all figure out the tricks of the trade I am of the opinion that the web as a whole will become a much more stable place.

    There certainly will be some growing pains though. For example right now a huge pain the dependence on popular database products (MySQL, PostgreSQL) which are wonderful, don’t get me wrong, But they are, currently, limited to the realm of the server, instead of the realm of the cloud
    So lets all put our heads together and start thinking of ways in which we can make use of the cloud as a cloud. We can do this!

    Down with HTML E-Mail!

    Begin rant

    I’m with Jeremy on this one… Lets face it, e-mail is broken.  We have long since outgrown it, we have been living with the pains of ot for a long time now.  It’s everyones favorite internet whipping boy. “I hate spam” “I hate stupid forwards” “I hate huge attachments”.  We spend all our time bitching about e-mail but them when something happens it’s “the sky is falling the sky is falling give me back my good sweet innocent e-mail the way it was before you broke it! It was JUST FINE THE WAY IT WAS WHY DID YOU HAVE TO CHANGE IT?!”

    Go whine to somebody else, seriously. EMail is the black plague of the internet, its an infectious disease, a self sustaining spiral down the drain of absurdity. I, for one, will be happy when all of the people who depend on it, and who enable it, and who empower it finally go retire on some island somewhere and the kids take over and it’s all about text messaging, not e-mail.

    Speaking of kids taking over: “SUCKS TO YOUR EMAIL!”

    End rant

    Amazon Ec2 Cookbook: Startup Flexibility

    Disclaimer: these code segments have not been really “tested” verbatim. I assume anyone who is successfully bundling EC2 images (that run) will know enough about copying shell scripts off blogs to test for typos, etc! Oh, and, sorry for lack of indentation on these… I’m just not taking the time 🙂

    As I’ve been searching for a way of using ec2 in a production environment. To keep it as simple as possible, but also eliminate the need for unnecessary (and extremely tedious) image building both during and after the development process (development of the AMI, not the service). This is what I’ve come up with.

    Step 1: our repository
    Create a subversion repository which is web accessible and password protected (of course) like so:

    • ami/trunk/init.sh
    • ami/trunk/files/
    • ami/tags/bootstrap.sh
    • ami/tags/

    ami/tags/bootstrap.sh would read:

    #!/bin/bash

    BootLocation=”ami/trunk”
    BootHost=”svnhost.com”
    BootUser=”userame”
    BootPass=”password”
    BootProtocol=”http”

    ## Prepare the bootstrap directory
    echo -en “\tPreparing… ”
    if [ -d /mnt/ami ]
    then
    rm -rf /mnt/ami
    fi
    mkdir -p /mnt/ami/
    if [ $? -ne 0 ]; then exit $?; else echo “OK”; fi
    ## populating the bootstrap
    echo -en “\tPopulating… ”
    svn –force \
    –username $BootUser \
    –password $BootPass \
    export \
    $BootProtocol://$BootHost/$BootLocation/ \
    /mnt/ami/ 1>/dev/null 2>/dev/null
    if [ $? -ne 0 ]; then exit $?; else echo “OK”; fi
    chmod a+x /mnt/ami/init.sh
    ## hand off
    echo -e “\tHanding off to init script…”
    /mnt/ami/init.sh
    exit $?

    ami/trunk/init.sh would read something like:

    #!/bin/bash
    ## Filesystem Additions/Changes
    echo -en “\t\tSynchronizing System Files… ”
    cd /mnt/ami/files/
    for i in $(find -type d)
    do
    mkdir -p “/$i”
    done
    echo -en “d”
    for i in $(find -type f)
    do
    cp -f “$i” “/$i”
    done
    echo -en “f”
    echo ” OK”
    ## Any Commands Go Here
    ## All Done!
    exit 0

    Step 2: configure your AMI

  • create /etc/init.d/servicename
  • chkconfig –add servicename
  • chkconfig –levels 345 servicename on
  • /etc/init.d/servicename should look something like:

    #! /bin/sh
    #
    # chkconfig: – 85 15
    # description: Ec2 Bootstrapping Process
    #
    RETVAL=0
    case “$1” in
    start)
    /usr/bin/wget \
    -o /dev/null -O /mnt/bootstrap.sh \
    http://user:pass@svnhost/ami/tags/bootstrap.sh
    /bin/bash /mnt/bootstrap.sh
    RETVAL=$?
    ;;
    stop)
    exit 0
    ;;
    restart)
    $0 start
    RETVAL=$?
    ;;
    *)
    echo “Usage: $0 {start|stop|restart}”
    exit 1
    ;;
    esac
    exit $RETVAL

    And now when the AMI boots itself up we hit 85 during runlevel 3 bootup (well after network initialization), servicename starts, and the bootstrapping begins. We’re then able, with our shell scripts, to make a great deal of changes to the system after the fact. These changes might be bugfixes, or they might be setup processes to reconstitute a database and download the latest code from a source control repository located elsewhere… They might be registration via a DNS API… anything at all.
    The point is that some flexibility is needed, and this is one way to build that in!

    Now, to be fair to MySQL

    you could use the mysql binary logs, stored on an infinidisk, to accomplish much the same thing, however, the fact that the pgsql WAL’s are copied automatically by the database server, and no nasty hacks are needed makes PostgreSQL a much cleaner first choice IMHO. However I’ve of course not tested this… yet..

    EC2 S3 PGSQL WAL PITR Infinidisk: The backend stack that just might change web services forever!

    I have written mostly about MySQL here in the past. The reason for this is simple: MySQL is what I know. I have always been a die hard “everything in its place and a place for everything” fanatic. I’ll bash Microsoft with the best of them, but I still recognize their place in the market. And now it’s time for me to examine the idea of PostgreSQL And this blog entry about amazon web services is the reason. I don’t claim to exactly agree with everything said here… as a matter of fact I tend to disagree with a lot of it… but I saw “PS: Postgresql seems to win hands down over MySQL in this respect; WAL is trivial to implement with Postgresql)” and thought to myself: “hmm, whats that?” I found the answer in the PostgreSQL documentation on Write Ahead Logging (WAL) and it all made sense! The specific end goal here is Continuous Archiving and Point-In-Time Recovery (PITR). This plus the S3 Infinidisk certainly do make for an interesting concept. One that I am eager to try out! I imagine that the community version of infinidisk would suffice here since we’re not depending on random access here… that ought to make for some chewy goodness!

    I’m always excited when we see something new for amazon web services

    http://www.openfount.com/blog/s3infidisk-for-ec2

    This certainly looks very interesting! I cant help but wonder if the memory caching in the neterprise version is enough to run small MySQL instances on? At the very least being able to MySQLdump regularly to a file directly on S3 would be useful as opposed to mysqldump to a file, split it into chunks, copy the chunks off to s3.

    Perhaps I’ll contact them next week and see if they’ll let me take it for a test drive?!