Amazon Ec2 Cookbook: Startup Flexibility

Disclaimer: these code segments have not been really “tested” verbatim. I assume anyone who is successfully bundling EC2 images (that run) will know enough about copying shell scripts off blogs to test for typos, etc! Oh, and, sorry for lack of indentation on these… I’m just not taking the time 🙂

As I’ve been searching for a way of using ec2 in a production environment. To keep it as simple as possible, but also eliminate the need for unnecessary (and extremely tedious) image building both during and after the development process (development of the AMI, not the service). This is what I’ve come up with.

Step 1: our repository
Create a subversion repository which is web accessible and password protected (of course) like so:

  • ami/trunk/init.sh
  • ami/trunk/files/
  • ami/tags/bootstrap.sh
  • ami/tags/

ami/tags/bootstrap.sh would read:

#!/bin/bash

BootLocation=”ami/trunk”
BootHost=”svnhost.com”
BootUser=”userame”
BootPass=”password”
BootProtocol=”http”

## Prepare the bootstrap directory
echo -en “\tPreparing… ”
if [ -d /mnt/ami ]
then
rm -rf /mnt/ami
fi
mkdir -p /mnt/ami/
if [ $? -ne 0 ]; then exit $?; else echo “OK”; fi
## populating the bootstrap
echo -en “\tPopulating… ”
svn –force \
–username $BootUser \
–password $BootPass \
export \
$BootProtocol://$BootHost/$BootLocation/ \
/mnt/ami/ 1>/dev/null 2>/dev/null
if [ $? -ne 0 ]; then exit $?; else echo “OK”; fi
chmod a+x /mnt/ami/init.sh
## hand off
echo -e “\tHanding off to init script…”
/mnt/ami/init.sh
exit $?

ami/trunk/init.sh would read something like:

#!/bin/bash
## Filesystem Additions/Changes
echo -en “\t\tSynchronizing System Files… ”
cd /mnt/ami/files/
for i in $(find -type d)
do
mkdir -p “/$i”
done
echo -en “d”
for i in $(find -type f)
do
cp -f “$i” “/$i”
done
echo -en “f”
echo ” OK”
## Any Commands Go Here
## All Done!
exit 0

Step 2: configure your AMI

  • create /etc/init.d/servicename
  • chkconfig –add servicename
  • chkconfig –levels 345 servicename on
  • /etc/init.d/servicename should look something like:

    #! /bin/sh
    #
    # chkconfig: – 85 15
    # description: Ec2 Bootstrapping Process
    #
    RETVAL=0
    case “$1” in
    start)
    /usr/bin/wget \
    -o /dev/null -O /mnt/bootstrap.sh \
    http://user:pass@svnhost/ami/tags/bootstrap.sh
    /bin/bash /mnt/bootstrap.sh
    RETVAL=$?
    ;;
    stop)
    exit 0
    ;;
    restart)
    $0 start
    RETVAL=$?
    ;;
    *)
    echo “Usage: $0 {start|stop|restart}”
    exit 1
    ;;
    esac
    exit $RETVAL

    And now when the AMI boots itself up we hit 85 during runlevel 3 bootup (well after network initialization), servicename starts, and the bootstrapping begins. We’re then able, with our shell scripts, to make a great deal of changes to the system after the fact. These changes might be bugfixes, or they might be setup processes to reconstitute a database and download the latest code from a source control repository located elsewhere… They might be registration via a DNS API… anything at all.
    The point is that some flexibility is needed, and this is one way to build that in!

    Now, to be fair to MySQL

    you could use the mysql binary logs, stored on an infinidisk, to accomplish much the same thing, however, the fact that the pgsql WAL’s are copied automatically by the database server, and no nasty hacks are needed makes PostgreSQL a much cleaner first choice IMHO. However I’ve of course not tested this… yet..

    EC2 S3 PGSQL WAL PITR Infinidisk: The backend stack that just might change web services forever!

    I have written mostly about MySQL here in the past. The reason for this is simple: MySQL is what I know. I have always been a die hard “everything in its place and a place for everything” fanatic. I’ll bash Microsoft with the best of them, but I still recognize their place in the market. And now it’s time for me to examine the idea of PostgreSQL And this blog entry about amazon web services is the reason. I don’t claim to exactly agree with everything said here… as a matter of fact I tend to disagree with a lot of it… but I saw “PS: Postgresql seems to win hands down over MySQL in this respect; WAL is trivial to implement with Postgresql)” and thought to myself: “hmm, whats that?” I found the answer in the PostgreSQL documentation on Write Ahead Logging (WAL) and it all made sense! The specific end goal here is Continuous Archiving and Point-In-Time Recovery (PITR). This plus the S3 Infinidisk certainly do make for an interesting concept. One that I am eager to try out! I imagine that the community version of infinidisk would suffice here since we’re not depending on random access here… that ought to make for some chewy goodness!

    Storage3

    News: Aug 24, 2007: Michael T. provided a patch to fix some date issues he was having with amazon aws. I have not verified this yet, but seeing as I’m not precisely sure when I will be able to verify it I figured I would put his code up here for you to download if you’re experiencing authentication issues like he was! Get it here: Patched Storage3 Class

    Current Version: 1.0.1

    CHANGELOG

    • 1.0.0
      • Initial Public Release
    • 1.0.1
      • Added function fileExists($s3bucket, $s3file)
      • Added support for listing bucket files for buckets with over 1,000 files
      • Added contributed function setACL($s3bucket, $s3file, $shorthand=public-read)
      • Added support for setting an objects acl during the upload process
      • Added grabbing of response headers (which contain a LOT of userful information)
      • s3test.php includes an example of using headers to verify the integrity of a file stored in s3 via md5 hash

    This is a revised version of the file posted here: http://blog.apokalyptik.com/storage3.phps and includes:

    • A local, modified, version of the pear HTTP/Request package
    • A local copy of all other required pear packages
    • Documentation (under development… docs arent my strong suite)
    • An example application (s3test.php)

    Cheers!
    –Apokalyptik

    I’m always excited when we see something new for amazon web services

    http://www.openfount.com/blog/s3infidisk-for-ec2

    This certainly looks very interesting! I cant help but wonder if the memory caching in the neterprise version is enough to run small MySQL instances on? At the very least being able to MySQLdump regularly to a file directly on S3 would be useful as opposed to mysqldump to a file, split it into chunks, copy the chunks off to s3.

    Perhaps I’ll contact them next week and see if they’ll let me take it for a test drive?!

    Since now seems to be the time for making predictions…

    Let me make a few of my own “The internet during 2007” predictions.

    • 2007 will be the year of the format wars. Differing schemas of XML will battle it out for the spot as the predominant method of sharing data in 2007. Because one size will never fit all we’ll probably end up with 2 or 3 schema layouts, varying in complexity and power. Almost nobody will end up using 2 of the 3 😀
    • 2007 will usher in the client side scripting wars. Will we still be using a J for AJAX after 2007? Probably, But I bet there will be some headway in finding a more modern, less quirky-by-vendor web scripting language. Something will do for client side scripting what PHP and Ruby have done for server side scripting.
    • We’ll see real action in the database-as-a-service mindshare. I’d expect Amazon and Google to weigh in on this action. Microsoft will likely sit out, though it would be a very stupid idea. If Microsoft provided a RESTable and SOAPable database service at a decent cost, they’d soon find themselves up to their ears in just the kind of data that an internet presence should covet! Specifically, though, we’ll see work in 2 areas easy databasing (think simple one to one and one to many relationships, like tags, terms, definitions, etc) and relational databases (think many-many, foreign key, transactions)
    • 2007 will see more wasted bits and bytes than gas — with all the uncompressed data interchange formats, and spam, flying around we’ll be wasting vast amounts of resources on the parts of consumable data that isnt really consumable. We probably wont see an answer to this in 2007, or if we do it wont be realized for some time to come.
    • Someone will attempt to retrofit e-mail. They wont succeed even though everybody is pulling for them to succeed.
    • New phones will be developed and released which have better web 2.0 support. Because after 2006 ends it’s not web2.0 anymore… it’s the web! (at least I’m hoping this is true)
    • We’ll see a large number of “old dog” programmers moving from the “hot and hip” web space to the mobile space. There’s such a generation gap between modern browsers mobile browsers that the progression will be pretty natural for those who dont feel like learning “new tricks”
    • People will continue to play with new ideas of making the internet social, That wont likely fall off, but what I suspect we’ll see are ways of making the internet more manifested. ether in facilitating physical meetings or actions, or in making a web presence manifest as a physical presence.
    • More real world data will be mapped into databases available for processing next year than ever before, from surveys to spatial analysis to trendy places to hang out. We’ll be moving closer and closer to making virtual space analogous to physical space. You wont have to walk to the corner store’s web site (where would the romance in THAT be?) but you can bet we’ll be coming closer and closer to your next door neighbors kids lemonade stand having a web presence.
    • Last but not least privacy will slip farther and farther towards unattainable. With so many vectors, so many reasons, so many locations in which one telling piece of information is being stored online, being invisible will be nearly impossible, and staying that way doubly so. But as we slip into a mode where identify is completely fabricatable… what, then does the theft of that identity mean?

    So I’m trying to solve a rather difficult problem…

    I have X number of data objects with up to about min 45,000, max 50,000 possible values associated with each object. I know… Don’t ask (couldn’t tell you anyways). Now to be able to do this in MySQL is… well… possible… but absurd. I’m thinking of trying out the approach I’ve mused about here. It could possibly be a really great way to manage finding commonalities across tens of thousands of objects with a total of hundreds of millions of values. Or it could be a massive time sink.

    It would also put some of the things that Amazon has said about their S3 service to the test 🙂 I doubt anyone’s really stored a hundred million objects in an S3 bucket and been concerned enough with seek time to be critical about it 🙂

    Or am I missing some magic bullet here. Is there a (free) DBMS I’m not thinking of which handles 50,000 columns in a table with a fast comparitive lookup? (select pk from table where v42000 = (select v42000 from table where pk = referencePk))… I’d love for someone to pop in and say “HEY, STUPID, DB XXX CAN DO THAT!” 🙂 assuming DB XXX isnt expensive 🙂

    Hmm… Off to ponder…