Just because you build it, doesnt mean they will come

Here’s a small bit of advice for all you would-be “cloud storage providers.” Just because you have a buttload of disks doesn’t mean people will be falling over themselves to use your software. If I have to spend *any* of my time worrying about your load, storage, or other internal algorithms (or unnecessary limitations for that matter) then YOU . HAVE . FAILED.

If I have to take the time to shard my data into 4096 different containers because you couldn’t be bothered to think “hey what if a service with a lot of users that create a lot of stuff decides to use us as a store?” Then you’re obviously not in it to win it (so to speak.)

Give us ABSTRACTED storage. Non abstracted storage we can do on our own thank you.


Posted on : Mar 19 2009
Posted under API, Business, Random Thoughts, Software Development, Web Stuff |

Just what you need to know to write a CouchDB reduce function

Lets say you have the CouchDB classes (located here) all compiled together and included into your test.php script. Lets also say that you have created a database with the built-in web ui called “testing”. Finally let us say that your test.php has the following code in it, which would add a record to the db every time it is run. (i know that the data in the document serves no useful purpose… but really I just want to figure out this map/reduce thing so that I can make awesome views… so this suffices sufficiently.)

require_once dirname( dirname( __FILE__ ) ) . '/includes/couchdb.php';
$couchdb = new CouchDB('testing', 'localhost', 5984);
$key = microtime();
$result = $couchdb->send(
    '/'.md5($key),
    'put',
    json_encode(
        array(
            "_id" => md5($key),
            "time" => $key,
            'md5' => md5($key),
            'sha1' => sha1($key),
            'crc' => crc32($key)
        )
    )
);
print_r($result->getBody(true));

After running the code a bunch of times you would end up with a bunch of documents which look more or less like this:

picture-1(click for full size)

Now lets say you want to write a view that told you what the first characters of the _id were and how many documents share that first letter. This is analogous to the following in MySQL

SELECT LEFT(md5, 1) AS `lchar`, count(md5) FROM `md5table` GROUP BY `lchar`

Your map function is easy, because you dont have any selection criteria, so we process all rows

function(doc){ emit(doc._id,doc); }

The reduce function is where the actual programming comes in… And it seems there aren’t many well explained examples of exactly how to do this (I just brute forced it by trial and error)

function(key, values, rereduce) { 
    var output = {};
    if ( rereduce ) { 
        // key is null, and values are values returned by previous calls
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// essentially we are taking the previously reduced view, and the 
	// reduced view for new records, and we are reducing those two things
	// together.  Summarizing two summaries, essentially
        for ( var i in values ) {
	    // here we have multiple prebuilt output objects and we're simply combining them
  	    // just like below we have an array with a numeric id and an output object
	    // 
	    // retrieve a summary
            var vals = values[i];
            for ( var key in vals ) {
		// debugging
                // log(key);
		// 
		// store in or increment our new output object 
                if ( output[key] == undefined )
                    output[key] = vals[key];
                else
                    output[key] = output[key] + vals[key];
            }
        }
    } else {
        // key is an array, which we dont care about, and values are the 
	// values returned by the map
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// we are taking each document and processing that, reducing it down
	// to a summary object (output) for each of the rows passed
        for ( var i in values ) {
	    // we have an array, values, with numeric ids and a document objects
	    //
	    // retrieve a document
            var doc = values[i];
	    // get what we want from it, the first char of the md5
            var key = doc._id.substr(0, 1);
	    // debugging
            // log( key + " :: " + doc._id );
	    //
	    // store or increment the output object
            if ( output[key] !== undefined )
                output[key] = output[key] + 1;
            else
                output[key] = 1;
        }
    }
    // done
    return output;
}

and in code, using a temporary view, ( if you used this view all the time you would want to make it permanent… but this is about how to lay out a reduce function, nothing more ) so request code that looks like this

$view = array(
    'map' => 'function(doc){ emit(doc._id,doc); }',
    'reduce' => '
        function(key, values, rereduce) { 
            var output = {};
            if ( rereduce ) { 
                // key is null, and values are values returned by previous calls
                for ( var i in values ) {
                    var vals = values[i];
                    for ( var key in vals ) {
                        // log(key);
                        if ( output[key] == undefined )
                            output[key] = vals[key];
                        else
                            output[key] = output[key] + vals[key];
                    }
                }
            } else {
                // key is an array, which we dont care about, and values are the values returneb by the map
                for ( var i in values ) {
                    var doc = values[i];
                    var key = doc._id.substr(0, 1);
                    // log( key + " :: " + doc._id );
                    if ( output[key] !== undefined )
                        output[key] = output[key] + 1;
                    else
                        output[key] = 1;
                }
            }
            return output;
        }
    '
    );
$result = $couchdb->send('/_temp_view', 'POST', json_encode($view) );
print_r($result->getBody(true));

would give you output that looks like this:

stdClass Object
(
    [rows] => Array
        (
            [0] => stdClass Object
                (
                    [key] => 
                    [value] => stdClass Object
                        (
                            [0] => 15
                            [1] => 17
                            [2] => 16
                            [3] => 13
                            [4] => 27
                            [5] => 18
                            [6] => 26
                            [7] => 15
                            [8] => 18
                            [9] => 21
                            [a] => 12
                            [b] => 23
                            [c] => 20
                            [d] => 27
                            [e] => 28
                            [f] => 26
                        )
 
                )
 
        )
 
)

I hope this helps somebody out.


Posted on : Feb 18 2009
Posted under API, Business, CLI, MySQL, PHP, Random Thoughts, Software Development, Web Stuff |

a dumbed down version of wpdb for sqlite

I’ve been working, gradually, on a project using an sqlite3 database (for its convenience) and found myself missing the clean elegance of wpdb… so I implemented it. It was actually really easy to do, and I figured I would throw it up here for anyone else wishing to use it. The functionality that I build this around is obtainable here: http://php-sqlite3.sourceforge.net/pmwiki/pmwiki.php (don’t freak… its in apt…)

With this I can focus on the sql, which is different enough, and not fumble over function names and such… $db = new sqlite_wpdb($dbfile, 3); var_dump($db->get_results(”SELECT * FROM `mytable` LIMIT 5″));

the code is below… and hopefully not too mangled…

< ?php
 
class sqlite_wpdb {
 
        var $version = null;
        var $db = null;
        var $result = null;
        var $error = null;
 
        function sqwpdb($file, $version=3) { 
                return $this->__construct($file, $version); 
        }
 
        function __construct($file, $version=3) {
                $function = "sqlite{$version}_open";
                if ( !function_exists($function) )
                        return false;
                if ( !file_exists($file) )
                        return false;
                if ( !$this->db = @$function($file) )
                        return false;
                $this->version = $version;
                $this->fquery = "sqlite{$this->version}_query";
                $this->ferror = "sqlite{$this->version}_error";
                $this->farray = "sqlite{$this->version}_fetch_array";
                return $this;
        }
 
        function escape($string) {
                return str_replace("'", "''", $string);
        }
 
        function query($query) {
                if ( $this->result = call_user_func($this->fquery, $this->db, $query) )
                        return $this->result;
                $this->error = call_user_func($this->ferror, $this->db);
                return false;
        }
 
        function array_to_object($array) {
                if ( ! is_array($array) )
                        return $array;
 
                $object = new stdClass();
                foreach ( $array as $idx => $val ) {
                        $object->$idx = $val;
                }
                return $object;
        }
 
        function get_results($query) {
                if ( !$this->query($query) )
                        return false;
                $rval = array();
                while ( $row = $this->array_to_object(call_user_func($this->farray, $this->result)) ) {
                        $rval[] = $row;
                }
                return $rval;
        }
 
        function get_row($query) {
                if ( ! $results = $this->get_results($query) )
                        return false;
                return array_shift($results);
        }
 
        function get_var($query) {
                return $this->get_val($query);
        }
 
        function get_val($query) {
                if ( !$row = $this->get_row($query) )
                        return false;
                $row = get_object_vars($row);
                if ( !count($row) )
                        return false;
                return array_shift($row);
        }
 
        function get_col($query) {
                if ( !$results = $this->get_results($query) )
                        return false;
                $column = array();
                foreach ( $results as $row ) {
                        $row = get_object_vars($row);
                        if ( !count($row) )
                                continue;
                        $column[] = array_shift($row);
                }
                return $column;
        }
 
}
 
?>

Posted on : Nov 18 2008
Posted under API, Business, CLI, Linux, MySQL, PHP, Personal, Random Thoughts, Software Development, Web Stuff |

Writing your own shell in php

I’ve always wanted to write my own simple shell in php.  Call me a glutin for punishment, but it seems like something that a lot of people could use to be able to do… If your web app had a command line interface for various things… like looking up stats, or users, or suspending naughty accounts, or whatever…. wouldnt that be cool and useful?  Talk about geek porn.  Anyways this this morning I got around to tinkering with the idea, and here is what i came up with… It’s rough, and empty, but its REALLY easy to extend and plug into any php application.

apokalyptik:~/phpshell$ ./shell.php

/home/apokalyptik/phpshell > hello

hi there

/home/apokalyptik/phpshell > hello world

hi there world

/home/apokalyptik/phpshell > cd ..

/home/apokalyptik/ > cd phpshell

/home/apokalyptik/phpshell > ls

shell.php

/home/apokalyptik//phpshell > exit

apokalyptik:~/phpshell$ ./shell.php

See the source here: shell.phps


Posted on : Aug 03 2008
Posted under API, Business, CLI, Linux, MySQL, PHP, Personal, Random Thoughts, Software Development, Web Stuff |

I keep marking this as unread in google reader…

I keep marking this as unread in google reader so that Its there when I need it… which probably means I should just blog it… automating firefox via telnet


Posted on : Apr 22 2008
Posted under API, CLI, Software Development, Web Stuff |

ruby-Mapquest Release v0.003

Primarily a bugfix release.  Catch it here:


Posted on : Feb 17 2007
Posted under API, Business, Geospacial, Ruby (on or off) Rails, Software Development, Web Stuff |

ruby-Mapquest Release v0.002

Welcome: ruby-Mapquest v0.002.  Wherein I’ve added support for routing (directions.)  Let me tell you that getting the info together for this was *NOT* a pretty picture…


Posted on : Feb 13 2007
Posted under API, Business, CLI, Geospacial, Ruby (on or off) Rails, Web Stuff |

ruby-Mapquest v0.001

After releasing my preliminary code for del.icio.us I’m releasing my preliminary code for mapquest’s OpenAPI service: ruby-Mapquest.


Posted on : Feb 11 2007
Posted under API, Business, Geospacial, Ruby (on or off) Rails, Software Development, Web Stuff |

ruby-Delicious v0.001

Since I’ve worked out the kinks mentioned in my last blog entry (was a problem with re-escaping already escaped data, by the way (never debug while sick and sleep deprived!)) I’ve scraped things together into a class which is a client for the api itself.  It’s relatively sparse right now, but good enough for using in an application.  Which is what the client is geared towards, by the way.  Specifically (and privately) tagging arbitrary data.  It *can* publicly tag URLs, but thats more or less a side effect if what delicious… is… and not a direct intention while writing the API.  You can visit the quickly thrown together ruby-Delicious page here (link also added up top)


Posted on : Feb 10 2007
Posted under API, Business, Personal, Ruby (on or off) Rails, Software Development, Web Stuff |