Quality Time With Your JPEGs

When working with user provided images in PHP you run into a problem. Lets say that you want to generate thumbnails of uploaded JPEGs for users. This is a fairly common use case where you would employ PHP and GD (the most prevalent php image extension.) But when you generate the new, smaller, image what quality setting do you use? If your quality setting is too low then the image is distorted unacceptably. Likewise if your quality setting is too high then you produce a dimensionally smaller image with a larger size in bytes than the original. So what do you do when you want to satisfy all cases? Well the obvious answer is that you should use the same JPEG quality setting that the image had when it was uploading. Now… Using PHP and GD tell me how you accomplish this.

Go ahead

I’ll wait

You can’t, can you? If you’re really sneaky you might be thinking about just pulling the data out of the binary stream, and if you’re a linux nut you’re probably sitting there muttering “just use identify (a la imagemagick)”. Of course if you try that under heavy traffic you’ll soon discover that it kills your servers. Just for reference I’ll share that code with you (not everyone is serving 2g/sec in dynamic image traffic after all). We already have the binary data in ram in the $rval variable, in case you were curious.

// define how we will deal with stdin, stdout, and stderr 
$descriptorspec = array( 0 => array( "pipe", "r" ), 1 => array( "pipe", "w" ), 2 => array( "pipe", "a" ) );
// run identify, -verbose required, - means read file from stdin
$process = proc_open( IDENTIFY . ' -verbose -', $descriptorspec, $pipes );
if ( false !== $process ) {
	// pipe the image data through it to its stdin
	fwrite( $pipes[0], $rval );
	// close stdin to allow identify to process
	fclose( $pipes[0] );
	// read the results of the program execution
	$results = stream_get_contents($pipes[1]);
	// clean up open file handles
	fclose( $pipes[1] );
	fclose( $pipes[2] );
	// Unix return value of 0 means success
	if ( 0 == proc_close($process) ) {
		// pull out the image quality from identify output
		if ( preg_match( '#Quality: ([0-9]+)[^0-9]#', $results, $m ) )
			$origin_quality = intval($m[1]);
		// detect when something goes wrong above
		if ( $origin_quality == 0 )
			$origin_quality = 100;
	}
}

So, since this consumes too much of our available resources (especially ram and cpu usage since identify fully decompresses and reads the image into a raw state for processing… hundreds of MB of ram, which you can limit but then it becomes unbearably slow…) that’s out. What now? If you’re really extra sneaky you’re thinking that you should be able to read the setting out of the binary data… there should be a header after all, right? Well… Yes there is a header but “quality” is not a “setting” its more a measure of how compressed the image is… which isn’t exactly recorded either… at least… not as an integer value. The compression matrix used to preform JPEGs lossy compression *IS* stored in a header.. and it turns out this is what the ImageMagick code uses to give us that quality setting. So I set out to reproduce this in PHP.

I know… I’m a masochist.

Thanks to some serious google-fu (and possibly a note in a php online doc manual relating to IPC, I don’t remember which of the two led me to the package first) I found that there’s already some code out there which deals with the nasty bits of reading raw JPEG headers (though not doing what I want with the header that I want) in the PHP JPEG Metadata Toolkit And the instructions for evaluating the header we can then pull is in the ImageMagick source code (coders/jpeg.c)

When we put the two together and modify it a bit to suite our needs (i.e. reading from the in-memory buffer, pulling just the right header from the jpeg file, etc) we get this code… and finally the ability to call $quality = get_jpeg_quality( $rval ); In my test (yea just one or two… very scientific like), this over 100 times faster than using the proc_open and executable method, uses less ram (a lot lot lot lot lot less ram) and generally just doesn’t suck as much.

< ?php

function get_jpeg_header_data( &$buff, $want=null ) { 
	$data = buffer_read( $buff, 2, true ); // Read the first two characters
	// Check that the first two characters are 0xFF 0xDA  (SOI - Start of image)
	if ( $data != "\xFF\xD8" ) {
		// No SOI (FF D8) at start of file - This probably isn't a JPEG file - close file and return;
		return false;
	}
	$data = buffer_read( $buff, 2 ); // Read the third character
	// Check that the third character is 0xFF (Start of first segment header)
	if ( $data{0} != "\xFF" ) {
		// NO FF found - close file and return - JPEG is probably corrupted
		return false;
	}
	// Cycle through the file until, one of: 
	//   1) an EOI (End of image) marker is hit,
	//   2) we have hit the compressed image data (no more headers are allowed after data)
	//   3) or end of file is hit
	$hit_compressed_image_data = FALSE;
	while ( ( $data{1} != "\xD9" ) && ( !$hit_compressed_image_data) && ( $data != '' ) ) { 
		// Found a segment to look at.
		// Check that the segment marker is not a Restart marker - restart markers don't have size or data after them
		if (  ( ord($data{1}) < 0xD0 ) || ( ord($data{1}) > 0xD7 ) ) {
			// Segment isn't a Restart marker
			$sizestr = buffer_read( $buff, 2 ); // Read the next two bytes (size)
			$decodedsize = unpack ("nsize", $sizestr); // convert the size bytes to an integer
			// Read the segment data with length indicated by the previously read size
			$segdata = buffer_read( $buff, $decodedsize['size'] - 2 );
			// Store the segment information in the output array
			if ( !$want || $want == ord($data{1}) ) {
				$headerdata[] = (object)array(  
					"SegType" => ord($data{1}),
					"SegName" => $GLOBALS[ "JPEG_Segment_Names" ][ ord($data{1}) ],
					"SegDesc" => $GLOBALS[ "JPEG_Segment_Descriptions" ][ ord($data{1}) ],
					"SegData" => $segdata 
				);
			}
		}
		// If this is a SOS (Start Of Scan) segment, then there is no more header data - the compressed image data follows
		if ( $data{1} == "\xDA" ) {
			$hit_compressed_image_data = true;
		} else {
			// Not an SOS - Read the next two bytes - should be the segment marker for the next segment
			$data = buffer_read( $buff, 2 );
			// Check that the first byte of the two is 0xFF as it should be for a marker
			if ( $data{0} != "\xFF" ) {
				// NO FF found - close file and return - JPEG is probably corrupted
				return false;
			}
		}
	}
	return $headerdata;
}

function buffer_read( &$buff, $len, $new=false ) {
	static $pointer = 0;
	if ( $new )
		$pointer = 0;
	if ( $pointer + $len > strlen( $buff ) ) {
		$len = strlen( $buff ) - $pointer;
		if ( $len < 1 )
			return null;
	}
	$data = substr( $buff, $pointer, $len );
	$pointer += $len;
	return $data;
}

// The names of the JPEG segment markers, indexed by their marker number
$GLOBALS[ "JPEG_Segment_Names" ] = array(
	0xC0 =>  "SOF0",  0xC1 =>  "SOF1",  0xC2 =>  "SOF2",  0xC3 =>  "SOF4",
	0xC5 =>  "SOF5",  0xC6 =>  "SOF6",  0xC7 =>  "SOF7",  0xC8 =>  "JPG",
	0xC9 =>  "SOF9",  0xCA =>  "SOF10", 0xCB =>  "SOF11", 0xCD =>  "SOF13",
	0xCE =>  "SOF14", 0xCF =>  "SOF15",
	0xC4 =>  "DHT",   0xCC =>  "DAC",
	0xD0 =>  "RST0",  0xD1 =>  "RST1",  0xD2 =>  "RST2",  0xD3 =>  "RST3",
	0xD4 =>  "RST4",  0xD5 =>  "RST5",  0xD6 =>  "RST6",  0xD7 =>  "RST7",
	0xD8 =>  "SOI",   0xD9 =>  "EOI",   0xDA =>  "SOS",   0xDB =>  "DQT",
	0xDC =>  "DNL",   0xDD =>  "DRI",   0xDE =>  "DHP",   0xDF =>  "EXP",
	0xE0 =>  "APP0",  0xE1 =>  "APP1",  0xE2 =>  "APP2",  0xE3 =>  "APP3",
	0xE4 =>  "APP4",  0xE5 =>  "APP5",  0xE6 =>  "APP6",  0xE7 =>  "APP7",
	0xE8 =>  "APP8",  0xE9 =>  "APP9",  0xEA =>  "APP10", 0xEB =>  "APP11",
	0xEC =>  "APP12", 0xED =>  "APP13", 0xEE =>  "APP14", 0xEF =>  "APP15",
	0xF0 =>  "JPG0",  0xF1 =>  "JPG1",  0xF2 =>  "JPG2",  0xF3 =>  "JPG3",
	0xF4 =>  "JPG4",  0xF5 =>  "JPG5",  0xF6 =>  "JPG6",  0xF7 =>  "JPG7",
	0xF8 =>  "JPG8",  0xF9 =>  "JPG9",  0xFA =>  "JPG10", 0xFB =>  "JPG11",
	0xFC =>  "JPG12", 0xFD =>  "JPG13",
	0xFE =>  "COM",   0x01 =>  "TEM",   0x02 =>  "RES",
);


// The descriptions of the JPEG segment markers, indexed by their marker number
$GLOBALS[ "JPEG_Segment_Descriptions" ] = array(
	/* JIF Marker byte pairs in JPEG Interchange Format sequence */
	0xC0 => "Start Of Frame (SOF) Huffman  - Baseline DCT",
	0xC1 =>  "Start Of Frame (SOF) Huffman  - Extended sequential DCT",
	0xC2 =>  "Start Of Frame Huffman  - Progressive DCT (SOF2)",
	0xC3 =>  "Start Of Frame Huffman  - Spatial (sequential) lossless (SOF3)",
	0xC5 =>  "Start Of Frame Huffman  - Differential sequential DCT (SOF5)",
	0xC6 =>  "Start Of Frame Huffman  - Differential progressive DCT (SOF6)",
	0xC7 =>  "Start Of Frame Huffman  - Differential spatial (SOF7)",
	0xC8 =>  "Start Of Frame Arithmetic - Reserved for JPEG extensions (JPG)",
	0xC9 =>  "Start Of Frame Arithmetic - Extended sequential DCT (SOF9)",
	0xCA =>  "Start Of Frame Arithmetic - Progressive DCT (SOF10)",
	0xCB =>  "Start Of Frame Arithmetic - Spatial (sequential) lossless (SOF11)",
	0xCD =>  "Start Of Frame Arithmetic - Differential sequential DCT (SOF13)",
	0xCE =>  "Start Of Frame Arithmetic - Differential progressive DCT (SOF14)",
	0xCF =>  "Start Of Frame Arithmetic - Differential spatial (SOF15)",
	0xC4 =>  "Define Huffman Table(s) (DHT)",
	0xCC =>  "Define Arithmetic coding conditioning(s) (DAC)",
	0xD0 =>  "Restart with modulo 8 count 0 (RST0)",
	0xD1 =>  "Restart with modulo 8 count 1 (RST1)",
	0xD2 =>  "Restart with modulo 8 count 2 (RST2)",
	0xD3 =>  "Restart with modulo 8 count 3 (RST3)",
	0xD4 =>  "Restart with modulo 8 count 4 (RST4)",
	0xD5 =>  "Restart with modulo 8 count 5 (RST5)",
	0xD6 =>  "Restart with modulo 8 count 6 (RST6)",
	0xD7 =>  "Restart with modulo 8 count 7 (RST7)",
	0xD8 =>  "Start of Image (SOI)",
	0xD9 =>  "End of Image (EOI)",
	0xDA =>  "Start of Scan (SOS)",
	0xDB =>  "Define quantization Table(s) (DQT)",
	0xDC =>  "Define Number of Lines (DNL)",
	0xDD =>  "Define Restart Interval (DRI)",
	0xDE =>  "Define Hierarchical progression (DHP)",
	0xDF =>  "Expand Reference Component(s) (EXP)",
	0xE0 =>  "Application Field 0 (APP0) - usually JFIF or JFXX",
	0xE1 =>  "Application Field 1 (APP1) - usually EXIF or XMP/RDF",
	0xE2 =>  "Application Field 2 (APP2) - usually Flashpix",
	0xE3 =>  "Application Field 3 (APP3)",
	0xE4 =>  "Application Field 4 (APP4)",
	0xE5 =>  "Application Field 5 (APP5)",
	0xE6 =>  "Application Field 6 (APP6)",
	0xE7 =>  "Application Field 7 (APP7)",
	0xE8 =>  "Application Field 8 (APP8)",
	0xE9 =>  "Application Field 9 (APP9)",
	0xEA =>  "Application Field 10 (APP10)",
	0xEB =>  "Application Field 11 (APP11)",
	0xEC =>  "Application Field 12 (APP12) - usually [picture info]",
	0xED =>  "Application Field 13 (APP13) - usually photoshop IRB / IPTC",
	0xEE =>  "Application Field 14 (APP14)",
	0xEF =>  "Application Field 15 (APP15)",
	0xF0 =>  "Reserved for JPEG extensions (JPG0)",
	0xF1 =>  "Reserved for JPEG extensions (JPG1)",
	0xF2 =>  "Reserved for JPEG extensions (JPG2)",
	0xF3 =>  "Reserved for JPEG extensions (JPG3)",
	0xF4 =>  "Reserved for JPEG extensions (JPG4)",
	0xF5 =>  "Reserved for JPEG extensions (JPG5)",
	0xF6 =>  "Reserved for JPEG extensions (JPG6)",
	0xF7 =>  "Reserved for JPEG extensions (JPG7)",
	0xF8 =>  "Reserved for JPEG extensions (JPG8)",
	0xF9 =>  "Reserved for JPEG extensions (JPG9)",
	0xFA =>  "Reserved for JPEG extensions (JPG10)",
	0xFB =>  "Reserved for JPEG extensions (JPG11)",
	0xFC =>  "Reserved for JPEG extensions (JPG12)",
	0xFD =>  "Reserved for JPEG extensions (JPG13)",
	0xFE =>  "Comment (COM)",
	0x01 =>  "For temp private use arith code (TEM)",
	0x02 =>  "Reserved (RES)",
);

/**
 * Most of this function is taken directly from the source code for imagemagick
 * and how it handles identify -verbose $filename to present you with a Quality
 * number.  This number should be considered approximate.  It's essentially
 * based upon the numbers used to perform compression on the original image
 * data... 
 *
 * See: http://www.obrador.com/essentialjpeg/headerinfo.htm
 * See: http://www.impulseadventure.com/photo/jpeg-quantization.html
 * See: http://www.impulseadventure.com/photo/jpeg-huffman-coding.html
 */
function get_jpeg_quality( &$buff ) {
	$tables = array(
		'multi' => array(
			'hash' => array(
				 1020, 1015,  932,  848,  780,  735,  702,  679,  660,  645,
				  632,  623,  613,  607,  600,  594,  589,  585,  581,  571,
				  555,  542,  529,  514,  494,  474,  457,  439,  424,  410,
				  397,  386,  373,  364,  351,  341,  334,  324,  317,  309,
				  299,  294,  287,  279,  274,  267,  262,  257,  251,  247,
				  243,  237,  232,  227,  222,  217,  213,  207,  202,  198,
				  192,  188,  183,  177,  173,  168,  163,  157,  153,  148,
				  143,  139,  132,  128,  125,  119,  115,  108,  104,   99,
				   94,   90,   84,   79,   74,   70,   64,   59,   55,   49,
				   45,   40,   34,   30,   25,   20,   15,   11,    6,    4,
					0	
			), // hash
			'sums' => array (
				 32640, 32635, 32266, 31495, 30665, 29804, 29146, 28599, 28104,
				 27670, 27225, 26725, 26210, 25716, 25240, 24789, 24373, 23946,
				 23572, 22846, 21801, 20842, 19949, 19121, 18386, 17651, 16998,
				 16349, 15800, 15247, 14783, 14321, 13859, 13535, 13081, 12702,
				 12423, 12056, 11779, 11513, 11135, 10955, 10676, 10392, 10208,
				  9928,  9747,  9564,  9369,  9193,  9017,  8822,  8639,  8458,
				  8270,  8084,  7896,  7710,  7527,  7347,  7156,  6977,  6788,
				  6607,  6422,  6236,  6054,  5867,  5684,  5495,  5305,  5128,
				  4945,  4751,  4638,  4442,  4248,  4065,  3888,  3698,  3509,
				  3326,  3139,  2957,  2775,  2586,  2405,  2216,  2037,  1846,
				  1666,  1483,  1297,  1109,   927,   735,   554,   375,   201,
				   128,     0
			 ), // sums
		), // multi
		'single' => array(
			'hash' => array(
               510,  505,  422,  380,  355,  338,  326,  318,  311,  305,
               300,  297,  293,  291,  288,  286,  284,  283,  281,  280,
               279,  278,  277,  273,  262,  251,  243,  233,  225,  218,
               211,  205,  198,  193,  186,  181,  177,  172,  168,  164,
               158,  156,  152,  148,  145,  142,  139,  136,  133,  131,
               129,  126,  123,  120,  118,  115,  113,  110,  107,  105,
               102,  100,   97,   94,   92,   89,   87,   83,   81,   79,
                76,   74,   70,   68,   66,   63,   61,   57,   55,   52,
                50,   48,   44,   42,   39,   37,   34,   31,   29,   26,
                24,   21,   18,   16,   13,   11,    8,    6,    3,    2,
                 0
			), // hash
			'sums' => array(
               16320, 16315, 15946, 15277, 14655, 14073, 13623, 13230, 12859,
               12560, 12240, 11861, 11456, 11081, 10714, 10360, 10027,  9679,
                9368,  9056,  8680,  8331,  7995,  7668,  7376,  7084,  6823,
                6562,  6345,  6125,  5939,  5756,  5571,  5421,  5240,  5086,
                4976,  4829,  4719,  4616,  4463,  4393,  4280,  4166,  4092,
                3980,  3909,  3835,  3755,  3688,  3621,  3541,  3467,  3396,
                3323,  3247,  3170,  3096,  3021,  2952,  2874,  2804,  2727,
                2657,  2583,  2509,  2437,  2362,  2290,  2211,  2136,  2068,
                1996,  1915,  1858,  1773,  1692,  1620,  1552,  1477,  1398,
                1326,  1251,  1179,  1109,  1031,   961,   884,   814,   736,
                 667,   592,   518,   441,   369,   292,   221,   151,    86,
                  64,     0
			), // sums
		), // single
	); // tables
	
	$headers = get_jpeg_header_data( $buff, 0xDB );
	if ( !is_array( $headers ) || !count( $headers ) )
		return 100;
	$header = $headers[0];	
	$quality = 0;
	if ( strlen($header->SegData) > 128 ) {
		$entry = array( 0 => array(), 1 => array() );
		foreach ( str_split( substr( $header->SegData, 1, 64) ) as $chr ) 
			$entry[0][] = ord($chr);
		foreach ( str_split( substr( $header->SegData, -64) ) as $chr )
			$entry[1][] = ord($chr);
		$sum = array_sum( $entry[0] ) + array_sum( $entry[1] );
		$qvalue = $entry[0][2] + $entry[0][53] + $entry[1][0] + $entry[1][63];
		$table = "multi";
	} else if ( strlen($header->SegData) > 64 ) {
		$entry = array( 0 => array() );
		foreach ( str_split( substr( $header->SegData, 1, 64) ) as $chr )
			$entry[0][] = ord($chr);
		$sum = array_sum( $entry[0] );
		$qvalue = $entry[0][2] + $entry[0][53];
		$table = "single";
	} else {
		return 100; // go with a safe value
	}
	for( $i = 0; $i < 100; $i++ ) {
		if ( ( $qvalue < $tables[$table]['hash'][$i] ) && ( $sum < $tables[$table]['sums'][$i] ) )
			continue;
		// error_log( "$table >> $qvalue && $sum >> " . ( $i + 1 ) );
		return $i + 1;
	}
	return 100; // go with a safe value
}

Right… serious suckage. Which is why I’m sharing it here so that you don’t have to go through all that trouble. You can just steal my stolen code. Aren’t GPL compatible licenses fun?

Ok, finally, you… in the back… stop jumping up and down screaming about the Imagick PECL extension… In my testing getCompressionQuality() didnt work and getImageCompression() was unavailable (though I hear there is a newer version of the extension now… YMMV)

Logging post data

Lets say you have a relatively complex php web application, like wordpress. You have it running under apache (which is common.) You have good control of your site via .htaccess (which is common — permalinks and all.) And something happens to your blog (e.g. someone is exploiting some unknown vulnerability to compromise your content), which you want to track down. So you want to log, for instance, HTTP POST data. Your first instinct might be to add some logging code to index.php (mine was) But there are a lot of possible places which might be directly accessed, especially in the admin. So The trick I use (and this is probably the only time I’ve ever condoned this) is to use PHPs auto_prepend_file functionality.

Make a /home/user/postlog/ directory, then a /home/user/postlog/logs/ directory (and chmod a+rw that one.) Next make a simple /home/user/postlog/postlog.php file with the following contents:

<?php 
if ( isset($_POST) && is_array($_POST) && count($_POST) > 0 ) { 
  $log_dir = dirname( __FILE__ ) . '/logs/'; 
  $log_name = "posts-" . $_SERVER['REMOTE_ADDR'] . "-" . date("Y-m-d-H") . ".log"; 
  $log_entry = gmdate('r') . "\t" . $_SERVER['REQUEST_URI'] . "
" . serialize($_POST) . "

"; 
  $fp=fopen( $log_dir . $log_name, 'a' ); 
  fputs($fp, $log_entry); 
  fclose($fp); } 
?>

Finally add this line to the top of your .htaccess file:

php_value auto_prepend_file /home/user/postlog/postlog.php

If all went well this should start logging any request to any php file with any post data into the /home/user/postlog/logs/ direcory (with a unique log per ip per day)

so-you-wanna-see-an-image

We’ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com… This is how. (covering a requested image already stored on S3, not the upload -> s3 process)

A request comes into pound for a file. Pound hashes the hostname (via a custom patch which we have not, but may, release) , to determine which of several backend servers the request should hit. Pound forwards the request to that server. This, of course, means that a given blog always serves from the same backend server. The only exception to the afore-mentioned rule is if that server is, for some reason, unavailable in which case it picks another server to serve that hostname from temporarily.

The request then comes into varnishd on the backend servers. The varnishd daemon checks its 300Gb worth of files cache and (for the sake of this example) finds nothing (hey, new images are uploaded all the time!) Varnishd then checks with the web server (running on the same machine, just bound to a different IP/Port#) and that request is handled by a custom script.

So, a http daemon on the same backend server runs the file request. The custom script checks the DB to gather information on the file (specifically which DC’s it is in, size, mod time, and whether its deleted or not) all this info is saved in memcached for 5 minutes. The script increments and checks the “hawtness” (term courtesy of Barry) of the file in memcached (if the file has been accessed over a certain # of times it is then deemed “hawt”, and a special header is sent with the response telling varnishd to put the file into its cache. When that happens the request would be served directly by varnishd in the previous paragraph and never hit the httpd or this script again (or at least not until the cache entry expires.)) At this point, assuming the file should exist (deleted = 0 in the files db) we fetch the file from a backend source.

Which backend source depends on where it is available. The order of preference is as follows: Always fetch from Amazon S3 if the file lives there (no matter what, the following preferences only ever occur if, for some reason, s3 = 0 in the files db), and if that fails fetch from the one files server we still have (which has larger slower disks, and is used for archiving purposes and fault tolerance only)

After fetching the file from the back end… the custom script hands the data and programatically generated headers to the http daemon, which hands the data to varnishd, varnishd hands the data to pound, pound hands the data to the requesting client, and the image appears in the web browser.

And there was much rejoicing (yay.)

For the visual people among us who like visuals and stuff… (I like visuals…) here goes…

Autumn Leaves Leaf #2: Feeder

This handy little bot keeps track of RSS feeds, and announces in the channel when one is updated. (note: be sure to edit the path to the datafiles) Each poller runs inside its own ruby thread, and can be run on its own independent schedule

require 'thread'
require 'rss/1.0'
require 'rss/2.0'
require 'open-uri'
require 'fileutils'
require 'digest/md5'

class Feeder < AutumnLeaf

def watch_feed(url, title, sleepfor=300)
  message "Watching (#{title}) [#{url}] every #{sleepfor} seconds"
  feedid = Digest::MD5.hexdigest(title)
  Thread.new {
    while true
      begin
        content = ""
        open(url) { |s|
          content = s.read
        }
        rss = RSS::Parser.parse(content, false)
        rss.items.each { |entry|
          digest = Digest::MD5.hexdigest(entry.title)
          if !File.exist?("/tmp/.rss.#{feedid}.#{digest}")
            FileUtils.touch("/tmp/.rss.#{feedid}.#{digest}")
            message "#{entry.title} (#{title}) #{entry.link}"
          end
          sleep(2)
        }
      rescue
        sleep(2)
      end
      sleep(sleepfor)
    end
  }
  sleep(1)
end

def did_start_up
  watch_feed("http://planet.wordpress.org/rss20.xml", "planet", 600)
  watch_feed("http://wordpress.com/feed/", "wpcom", 300)
end

end

Autumn Leaves Leaf #1: Announcer

This bot is perfect for anything where you need to easily build IRC channel notifications into an existing process. It’s simple, clean, and agnostic. Quite simply you connect to a TCP port, give it one line, the port closes, the line given shows up in the channel. eg: echo ‘hello’ | nc -q 1 bothost 22122

require 'socket'
require 'thread'

class Announcer < AutumnLeaf

        def handle_incoming(sock)
                Thread.new {
                line = sock.gets
                message line
                sock.close
                }
        end

        def did_start_up
                Thread.new {
                        listener = TCPServer.new('',22122)
                        while (new_socket = listener.accept)
                                handle_incoming(new_socket)
                        end
                }
        end

end

DivShare, Day 1 (raw commentary)

I began looking at divshare a few days ago as a way to stor, save, and share my personal photo collection.  The idea of auto-galleries, unlimited space, flash video, and possible FTP access was… enticing.  But it’s tough to tell how something like this is going to work on a large scale…

So… after messing around with a free divshare account for a while I decided it was more worth my while to pay 10 bucks for a pro account and get FTP access than to try and use mechanize (or something similar) to hack out my own makeshift API.  Now I have about… Oh… 8,000 files I want to upload… So… doing that 10 at a time was just _NOT_ going to happen…

After paying for a pro account I was *immediately* granted FTP access, no waiting. And for that I was grateful.  Since I take photos at 6MP, and thats WAY too large for most online uses I have a shell script which automagically creates 5%, 10%, and 25% or original sized thumbnails.  This meant that I had an expansive set of files I could upload and only take a couple of hours doing it (5% thumbs end up being less than 200Mb.)  This, I thought, would be an excellent test of their interfaces.

So an-ftping-i-a-go.  Upload all my files into a sub directory (005). Visit the dash. nothing. Visit the ftp-upload-page to recheck… maybe I did something wrong. AND WHAM! an 8,000 check box form to accept ftp uploaded files… ugh.  Thankfully they’re all checked by default.  I let it load (for a long while) and hit submit… and wait… and wait… and wait.  Then the server side connection times out a while later.  Fair enough. Check my dash… about 1500 of the 8,000 photos were imported… I’m going to have to do this 6 times. Annoying, but doable.  Hit the second submit, and pop open another browser to look at my dash.  And divshare did *nothing* with my folder name… that wasnt translated to a “virtual” folder at all. tsk tsk.

So I need to put about 1500 photo, manually, into an 005 folder… and then I realize… I have to do this 20 files at a time… with no way to just show files that are not currently in a folder.

… uh no …

Ok, so I open up one of the photos that I DID put into the 005 folder, and it did, in fact, make them into a “gallery” of sorts. It made a thumbnail , and displayed all 3,000 photos side by side in something similar to an iframe… no rows. just one row… 3,000 columns… and waiting as my browser requests each… and every… thumb… from divshare. Wonderful.  The gallery controls are simple enough an iframe with a scrollbar at the bottom, a next photo link, and a previous photo link.  And all 3 controls make you loose your place in the iframe when you use them…

Now dont get me wrong. You get what you pay for. But hey… I did pay this time ;)  The service is excellent for what it does. And my use case was a bit extreme. Still I hope that they address these issues that I’ve pointed out.  I’d really like to continue using them, and if they can make my pohoto process easier I’ll gladly keep paying them $10/mo

Thats

  1. Don’t ignore what pro users are telling you when they upload
  2. Process large-accepts in the background, let me know I need to come back later
  3. Negative searching (folder == nil)
  4. Mass file controls (Iether items/page, or all-items-in-view (folder == nil))
  5. Give me a gallery a non-broadband user can use (1500 thumbs in one sitting tastes bad, more filling)
  6. Don’t undo what I’ve done in the gallery every click.  Finding your place among 8,000 photos is tedious to do once

And I know I sound like I’m just complaining. And I am. But this is web 2.0 feedback baby. Ignore my grouchiness, and (If I’m lucky) take my suggestions and run with them asap.  The photo/files market is very very far from cornered!

Whoa, talk about neglecting your weblog! Bad Form!

I know, I know, I’ve been silent for quite some time. Well Let me assure you that I’m quite all right! Are you less worried about me now? Oh good. (Yes I’m a cynical bastage sometimes.)

So life has, as it tends to do, come at me pretty fast. I’ve left my previous employer, Ookles, and I wish them all the best in accomplishing everything that they’ve been working towards. So I’ve Joined up with the very smart, very cool guys at Automattic. I have to tell you I’m excited to be working with these guys, they’re truly a great group.

I guess that means I’m… kind of… like… obligated to keep up on my blog now, eh?  I’m also kind of, like, ehausted. Jumping feet first into large projects has a tendency to do that to a guy though.  And truth be told I would have it any other way…

😀

Cheers

DK

Openfound (cont)

If there’s one thing that the OpenFount guys have shown me is that they’re serious about the Infinidisk product.  Mr. Donahue gave me a quick call this evening (seems my e-mail server and his e-mail server aren’t talking properly, so while I get his communications) he has not received mine (probably explaining the lack of response to my pre-sales inquiry)) to chat about his product. The particular bug that I noticed, he mentioned, was fixed a while ago in a later release than I’d tried.  In my defense the page looks precisely like it did when I first got the product, and the release tar file has no version numbers on it… yet… so I did check for updates.  I found out some good info, though.  They’re working on  putting up a trac page for some real bidirectional community participation soon.  They’ll also be putting version numbers in their archives soon. Both of those things will help, I think, improve their visibility to people like me (who have very very little time.)
I’ll be re-testing the infinidisk product again, later, when next I customize an AMI.