Just because you build it, doesnt mean they will come

Here’s a small bit of advice for all you would-be “cloud storage providers.” Just because you have a buttload of disks doesn’t mean people will be falling over themselves to use your software. If I have to spend *any* of my time worrying about your load, storage, or other internal algorithms (or unnecessary limitations for that matter) then YOU . HAVE . FAILED.

If I have to take the time to shard my data into 4096 different containers because you couldn’t be bothered to think “hey what if a service with a lot of users that create a lot of stuff decides to use us as a store?” Then you’re obviously not in it to win it (so to speak.)

Give us ABSTRACTED storage. Non abstracted storage we can do on our own thank you.


Posted on : Mar 19 2009
Posted under API, Business, Random Thoughts, Software Development, Web Stuff |

Just what you need to know to write a CouchDB reduce function

Lets say you have the CouchDB classes (located here) all compiled together and included into your test.php script. Lets also say that you have created a database with the built-in web ui called “testing”. Finally let us say that your test.php has the following code in it, which would add a record to the db every time it is run. (i know that the data in the document serves no useful purpose… but really I just want to figure out this map/reduce thing so that I can make awesome views… so this suffices sufficiently.)

require_once dirname( dirname( __FILE__ ) ) . '/includes/couchdb.php';
$couchdb = new CouchDB('testing', 'localhost', 5984);
$key = microtime();
$result = $couchdb->send(
    '/'.md5($key),
    'put',
    json_encode(
        array(
            "_id" => md5($key),
            "time" => $key,
            'md5' => md5($key),
            'sha1' => sha1($key),
            'crc' => crc32($key)
        )
    )
);
print_r($result->getBody(true));

After running the code a bunch of times you would end up with a bunch of documents which look more or less like this:

picture-1(click for full size)

Now lets say you want to write a view that told you what the first characters of the _id were and how many documents share that first letter. This is analogous to the following in MySQL

SELECT LEFT(md5, 1) AS `lchar`, count(md5) FROM `md5table` GROUP BY `lchar`

Your map function is easy, because you dont have any selection criteria, so we process all rows

function(doc){ emit(doc._id,doc); }

The reduce function is where the actual programming comes in… And it seems there aren’t many well explained examples of exactly how to do this (I just brute forced it by trial and error)

function(key, values, rereduce) { 
    var output = {};
    if ( rereduce ) { 
        // key is null, and values are values returned by previous calls
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// essentially we are taking the previously reduced view, and the 
	// reduced view for new records, and we are reducing those two things
	// together.  Summarizing two summaries, essentially
        for ( var i in values ) {
	    // here we have multiple prebuilt output objects and we're simply combining them
  	    // just like below we have an array with a numeric id and an output object
	    // 
	    // retrieve a summary
            var vals = values[i];
            for ( var key in vals ) {
		// debugging
                // log(key);
		// 
		// store in or increment our new output object 
                if ( output[key] == undefined )
                    output[key] = vals[key];
                else
                    output[key] = output[key] + vals[key];
            }
        }
    } else {
        // key is an array, which we dont care about, and values are the 
	// values returned by the map
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// we are taking each document and processing that, reducing it down
	// to a summary object (output) for each of the rows passed
        for ( var i in values ) {
	    // we have an array, values, with numeric ids and a document objects
	    //
	    // retrieve a document
            var doc = values[i];
	    // get what we want from it, the first char of the md5
            var key = doc._id.substr(0, 1);
	    // debugging
            // log( key + " :: " + doc._id );
	    //
	    // store or increment the output object
            if ( output[key] !== undefined )
                output[key] = output[key] + 1;
            else
                output[key] = 1;
        }
    }
    // done
    return output;
}

and in code, using a temporary view, ( if you used this view all the time you would want to make it permanent… but this is about how to lay out a reduce function, nothing more ) so request code that looks like this

$view = array(
    'map' => 'function(doc){ emit(doc._id,doc); }',
    'reduce' => '
        function(key, values, rereduce) { 
            var output = {};
            if ( rereduce ) { 
                // key is null, and values are values returned by previous calls
                for ( var i in values ) {
                    var vals = values[i];
                    for ( var key in vals ) {
                        // log(key);
                        if ( output[key] == undefined )
                            output[key] = vals[key];
                        else
                            output[key] = output[key] + vals[key];
                    }
                }
            } else {
                // key is an array, which we dont care about, and values are the values returneb by the map
                for ( var i in values ) {
                    var doc = values[i];
                    var key = doc._id.substr(0, 1);
                    // log( key + " :: " + doc._id );
                    if ( output[key] !== undefined )
                        output[key] = output[key] + 1;
                    else
                        output[key] = 1;
                }
            }
            return output;
        }
    '
    );
$result = $couchdb->send('/_temp_view', 'POST', json_encode($view) );
print_r($result->getBody(true));

would give you output that looks like this:

stdClass Object
(
    [rows] => Array
        (
            [0] => stdClass Object
                (
                    [key] => 
                    [value] => stdClass Object
                        (
                            [0] => 15
                            [1] => 17
                            [2] => 16
                            [3] => 13
                            [4] => 27
                            [5] => 18
                            [6] => 26
                            [7] => 15
                            [8] => 18
                            [9] => 21
                            [a] => 12
                            [b] => 23
                            [c] => 20
                            [d] => 27
                            [e] => 28
                            [f] => 26
                        )
 
                )
 
        )
 
)

I hope this helps somebody out.


Posted on : Feb 18 2009
Posted under API, Business, CLI, MySQL, PHP, Random Thoughts, Software Development, Web Stuff |

random php… a multi-channel chat rooom class using memcached for persistence

why? i dunno… just because… just a toy…

no sql, no flat file, no write permissions required anywhere, no fuss

class mc_chat {
 
        var $chan = null;
        var $mc = null;
        var $ret = 5;
 
        function __construct($memcached, $channel, $retention=5) {
                $this->mc = $memcached;
                $this->chan = $channel;
                $this->ret = $retention;
        }
 
        function messages( $from=0 ) {
                $max = (int)$this->mc->get("$this->chan:max:posted");
                $min = (int)$this->mc->get("$this->chan:min:posted");
                $messages = array();
                for ( $i=$min; $i< =$max; $i++ ) {
                        if ( $i < $from )
                                continue;
                        $m = $this->get($i);
                        if ( $m['user'] && $m['message'] )
                                $messages[$i] = $m;
                }
                return $messages;
        }
 
        function get($id) {
                return array(
                        'user' =>(string)$this->mc->get("$this->chan:msg:$id:user"),
                        'message' => (string)$this->mc->get("$this->chan:msg:$id"),
                );
        }
 
        function add($user, $message) {
                $id = (int)$this->mc->increment("$this->chan:max:posted");
                if ( !$id ) {
                        $id=1;
                        $this->mc->set("$this->chan:max:posted", 1);
                }
                $this->mc->set("$this->chan:msg:$id:user", (string)$user);
                $this->mc->set("$this->chan:msg:$id", (string)$message);
                if ( $id >= $this->ret ) {
                        if ( !$this->mc->increment("$this->chan:min:posted") )
                                $this->mc->set("$this->chan:min:posted", 1);
                }
 
        }
 
}
 
$mc = new Memcache;
$mc->connect('localhost', 11211);
$keep_messages = 10;
$chatter_id = 1;
$chat = new mc_chat($mc, 'chat-room-id', $keep_messages);
$chat->add($chatter_id, date("r").": $chatter_id : foo");
$chat->messages(37); // messages only above id=37
$chat->messages(); // all the latest messages

Posted on : Feb 16 2009
Posted under PHP, Random Thoughts |

Debian Lenny, Avahi, AFP… Linux Fileserver for OSX Clients

If you’re like me you have an OSX computer or 3 at home, and a debian file server. If you’re like me you hate samba/nfs on principle and want your debian server to show up in finder.  If you’re like me you arent using debian 3 which is what most of the walkthroughs seem to expect…  This is how I did it… With Debian Lenny.

What we’re using, and why:

  • Avahi handles zeroconf (making it show up in finder) (most howtos involve howl which is no longer in apt)
  • netatalk has afpd
  • afpd is the fileserver

From: http://blog.damontimm.com/how-to-install-netatalk-afp-on-ubuntu-with-encrypted-authentication/

  • apt-get update
  • mkdir -p ~/src/netatalk
  • cd ~/src/netatalk
  • apt-get install cracklib2-dev libssl-dev
  • apt-get source netatalk
  • apt-get build-dep netatalk
  • cd netatalk-2.0.3

From: http://www.sharedknowhow.com/2008/05/installing-netatalk-under-centos-5-with-leopard-support/

  • vim bin/cnid/cnid_index.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 277
  • vim etc/cnid_dbd/dbif.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 517

Mine

  • ./configure –prefix=/usr/local/netatalk
  • make
  • make install
  • vim /etc/rc.local ## add “/usr/local/netatalk/sbin/afpd”
  • /usr/local/netatalk/sbin/afpd

From: http://www.disgruntled-dutch.com/2007/general/how-to-get-your-linux-based-afp-server-to-show-up-correctly-in-leopards-new-finder

  • apt-get install avahi-daemon
  • vim /etc/nsswitch.conf ## make the hosts line read “hosts: files dns mdns4″
  • cd /etc/avahi/services
  • wget http://www.disgruntled-dutch.com/media/afpd.service
  • /etc/init.d/avahi-daemon restart

in case that file drops off the face of the net, this is its contents (except “< ?” is “<?” and “< !” is “<!”) :

< ?xml version="1.0" standalone='no'?><!--*-nxml-*-->
< !DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service -group>
<name replace-wildcards="yes">%h</name>
</service><service>
<type>_afpovertcp._tcp</type>
<port>548</port>
</service>

At this point your server should show up under the network in your finder… and you should be able to connect with any system username/pw combo


Posted on : Feb 12 2009
Posted under Apple, CLI, Linux, Personal, Random Thoughts, Software Development |

randomly decided to install opensolaris inside virtualbox

just to try it out. its not done installing…. but… 8 char max for the user login is pretty lame… what is this… 1990?


Posted on : Feb 05 2009
Posted under Random Thoughts |

Google stops development on 6 services

[edit: link]

I already see the Stallmanites rallying for their battle cries. Never using anything you didn’t write yourself is an asinine concept, in my opinion… This coming from someone who can write web services himself. The truth is that using services “in the cloud,” “on the web,” or anywhere else is just like using local software in one very important sense. (I particularly like one comment I heard on this once which went something like: “Would you be able to validate the source code to the ls binary on your own?”)

If your data is not in two completely separate locations, then it’s not safe.

My wife, who’s in need of some extra storage space (shes starting to get into photography some) got two external 120gb hard drives (which were on clearance.) Before I let her use them I sat her down and gave her some important advice: “If you store your data on one of these drives… it is NOT backed up… it’s just stored on that disk. And if something happens to that disk there is NOTHING that I can do to save your photos. Period. I bought you two because every so often you need to copy whats important to the second disk. That was if one disk dies, you don’t loose your stuff”

I’m not sure why people feel that if they’re using apps in the cloud that this doesn’t apply to them. A service shutting down is basically equivalent to loosing a hard disk. Be prepared. Back your data up if it’s that important!

As an aside. All WordPress installations allow you to export your data — even WordPress.com. I suggest people take advantage of that on occasion!


Posted on : Jan 18 2009
Posted under Random Thoughts |

A Pure Memcached Queue

This is pretty clever… Might have to code this up in PHP… memcachequeue-a-pure-memcached-queue


Posted on : Jan 16 2009
Posted under Random Thoughts |

php image functions failing on uploaded images

if you’re dealing with user uploaded images in any non-passthrough way (such as resizing, converting, etc) you may be familiar with this particular error message for corrupt images: “Corrupt JPEG data: xxx extraneous bytes before marker 0xd9″ Regardless of who, how, or why this happens the error is usually non fatal as far as the visual image data itself is concerned. And you just want php to make a gd resource from the image already… right? Well I cant make it do that, but if you have imagemagick installed you can simly execute the following:

/path/to/mogrify /path/to/image -strip

this will strip out the image metadata inside the image (with a WARNING (which is all this message is intended to be… thanks php/gd))

I have a feeling this will help a number of frustrated developers AND users who are left wondering why their image wont work…


Posted on : Jan 15 2009
Posted under Random Thoughts |

Palms new phone

It looks nice… but will Palm be able to pull out of their nose-dive? I guess, to my mind, if Palm were a scene from a movie It’s be the end of the subway scene from the first Matrix movie. Palm is neo on the edge of the platform trying not to get sucked into the train (except that Palm didn’t just finish kicking anyones ass.) The question i guess we’re all wondering is whether they can keep from face-planting into the side of the train as it rushes past?

That was probably a really stupid analogy… I really want to have faith in Palm… I had more than one Treo… I just cant bring myself to be anything but cautious…


Posted on : Jan 08 2009
Posted under Random Thoughts |

A tale of two gamers

Gamer #1 — The youngster. The youngster grew up playing games… The NES isn’t so much nostalgic as archaic. They don’t have much money working on getting through high-school, college, or fresh out. But what they lack in cash flow they make up for in time. These are the purists. If this person spends $60 on a game they want it to be CHALLENGING. They want as much time and value out of a game as they can possibly get. These are the WoW players who think spending 30 hours a week is a good investment of their time.

Gamer #2 — The spouse. The spouse used to be a youngster. But they’ve since acquired this marvelous, and strange, new thing — a life. Now they have a career — not a job — a wife, and possibly a kid or two. When this person forks out $60 for a game he wants a good respite from real life without consuming his real life. It’s challenging for this person to put in 5 hours a week into a game and 30 is completely unrealistic (even if they’d love to be able to.) This gamer wants to be completely engrossed and entertained for a while but also needs to be able to put the game down and pick it up again in 2 weeks without loosing much.

I get reminded of this from time to time on forum discussions and the like. Especially when you have these two factions arguing over things like gold farmers, and cheats, and glitches. I happen to fall into the Gamer #2 category, and I can tell you that when I get 3 hours to sit down and play a game I do NOT want to spend that hard earned time grinding. __I__JUST__DON’T__. The real PITA about that is that I would LOVE to play games like WoW.

I hope that game developers start looking at my demographic (unlike kids (who in their defense can’t) I’m willing to pay for my games…) seriously. give me a WoW server with the exp and gold tweaked so that I can get past the crappy ‘kill 50 fluffy bunnies’ grinding quests, and get to some fun gaming before I turn 60.

The youngsters will, of course, argue things like “taking away from the game,” “why even bother playing,” and things like “it’s not that hard and doesnt take that long, you just suck, n00b.”

To which I say bite me. Until game makers start understanding that some gamers want a time sink, and some gamers cant afford one. I’ll be the low level guy who just payed for 60 hours of some Chinese gold farmers time…. Because one of these days I’d like to get to do something interesting… I’ll be the guy who uses the game glitch to avoid spending 15 hours forging swords or chopping wood.

I would like to not have to resort to these measures… and If the game manufacturers would just throw us a bone, I bet we wouldnt…


Posted on : Dec 31 2008
Tags:
Posted under Random Thoughts |