PHP SSH2 code

I’ve had a need to use the PHP SSH2 PECL recently (working on making a product, at work, more efficient) And thought I would share some of the preliminary code. You can find it here: vpssh.phps

The most interesting thing is not vpssh_core or it’s exec (though it’s good code) the really interesting thing is the vpssh_tunnel class and the accompanying examples at the top of the file. This really shows some advanced usage of ssh2_tunnel that you can’t really find anywhere else.

It’s just the beginnings of some useful code, but it’s probably a huge jumping off point for anyone seriously looking into the ssh2 pecl functionality. Oh, it also works with both password and key based authentication.

This code is less than 12 hours old, and it works for me so far, YMMV. Feedback welcome … or not… Whatever. Hope it helps someone out. And I hope it helps me out later when I need this kind of thing again.

php debugging the really really hard way

If you’re ever in a situation where something is only happening intermittently, and only on a live server, and only while it’s under load… Lets say its not generating any error_log or stderr output, and you cant run it manually to reproduce… (we’ve all been in this situation) How do you get any debugging output at all?

Step 1: add this to the top of your entry point php file

if ( $_SERVER['REMOTE_ADDR'] == '127.0.0.1' ) {
 Â  Â  Â  error_log( ' :: ' . getmypid() );
 Â  Â  Â  sleep( 10 );
}

Step 2: use curl on the localhost to make the request

Step 3: (this assumes your error log is /tmp/php-error-output) run the following command in a second (root) terminal window

strace -p $(tail -n 1000 /tmp/php-error-output | grep ' :: ' | tail -n 1 | sed -r s/'^.+ :: '//g) -s 10240 2>&1

Good luck…

Howto: Enable regular expression highlighting in LimeChat

in LimeChat.app/Contents/Resources/logrenderer.rb around line 419… WFM. IANAL. YMMV. RTFM. OMGWTF. WTL. GTFO. ETC.

    words.each do |w|
      next if w.empty?
      s = body
      offset = 0
# 
      rex = Regexp.new(w, true)
# 
      while rex =~ s
# 
        begin
# 
          left = $~.begin(0)
          right = $~.end(0)
          pre = $~.pre_match
          post = $~.post_match
          ok = true

          if exact_word_match
            if !pre.empty? && alphabetic?(w.first_char) && alphabetic?(pre.last_char)
              ok = false
            elsif !post.empty? && alphabetic?(w.last_char) && alphabetic?(post.first_char)
              ok = false
            end
          end

          if ok
            keywords < < { :pos => offset+left, :len => right-left }
          end
          s = post
          offset += right
# 
        rescue
          next
        end
# 
      end
    end

Quality Time With Your JPEGs

When working with user provided images in PHP you run into a problem. Lets say that you want to generate thumbnails of uploaded JPEGs for users. This is a fairly common use case where you would employ PHP and GD (the most prevalent php image extension.) But when you generate the new, smaller, image what quality setting do you use? If your quality setting is too low then the image is distorted unacceptably. Likewise if your quality setting is too high then you produce a dimensionally smaller image with a larger size in bytes than the original. So what do you do when you want to satisfy all cases? Well the obvious answer is that you should use the same JPEG quality setting that the image had when it was uploading. Now… Using PHP and GD tell me how you accomplish this.

Go ahead

I’ll wait

You can’t, can you? If you’re really sneaky you might be thinking about just pulling the data out of the binary stream, and if you’re a linux nut you’re probably sitting there muttering “just use identify (a la imagemagick)”. Of course if you try that under heavy traffic you’ll soon discover that it kills your servers. Just for reference I’ll share that code with you (not everyone is serving 2g/sec in dynamic image traffic after all). We already have the binary data in ram in the $rval variable, in case you were curious.

// define how we will deal with stdin, stdout, and stderr 
$descriptorspec = array( 0 => array( "pipe", "r" ), 1 => array( "pipe", "w" ), 2 => array( "pipe", "a" ) );
// run identify, -verbose required, - means read file from stdin
$process = proc_open( IDENTIFY . ' -verbose -', $descriptorspec, $pipes );
if ( false !== $process ) {
	// pipe the image data through it to its stdin
	fwrite( $pipes[0], $rval );
	// close stdin to allow identify to process
	fclose( $pipes[0] );
	// read the results of the program execution
	$results = stream_get_contents($pipes[1]);
	// clean up open file handles
	fclose( $pipes[1] );
	fclose( $pipes[2] );
	// Unix return value of 0 means success
	if ( 0 == proc_close($process) ) {
		// pull out the image quality from identify output
		if ( preg_match( '#Quality: ([0-9]+)[^0-9]#', $results, $m ) )
			$origin_quality = intval($m[1]);
		// detect when something goes wrong above
		if ( $origin_quality == 0 )
			$origin_quality = 100;
	}
}

So, since this consumes too much of our available resources (especially ram and cpu usage since identify fully decompresses and reads the image into a raw state for processing… hundreds of MB of ram, which you can limit but then it becomes unbearably slow…) that’s out. What now? If you’re really extra sneaky you’re thinking that you should be able to read the setting out of the binary data… there should be a header after all, right? Well… Yes there is a header but “quality” is not a “setting” its more a measure of how compressed the image is… which isn’t exactly recorded either… at least… not as an integer value. The compression matrix used to preform JPEGs lossy compression *IS* stored in a header.. and it turns out this is what the ImageMagick code uses to give us that quality setting. So I set out to reproduce this in PHP.

I know… I’m a masochist.

Thanks to some serious google-fu (and possibly a note in a php online doc manual relating to IPC, I don’t remember which of the two led me to the package first) I found that there’s already some code out there which deals with the nasty bits of reading raw JPEG headers (though not doing what I want with the header that I want) in the PHP JPEG Metadata Toolkit And the instructions for evaluating the header we can then pull is in the ImageMagick source code (coders/jpeg.c)

When we put the two together and modify it a bit to suite our needs (i.e. reading from the in-memory buffer, pulling just the right header from the jpeg file, etc) we get this code… and finally the ability to call $quality = get_jpeg_quality( $rval ); In my test (yea just one or two… very scientific like), this over 100 times faster than using the proc_open and executable method, uses less ram (a lot lot lot lot lot less ram) and generally just doesn’t suck as much.

< ?php

function get_jpeg_header_data( &$buff, $want=null ) { 
	$data = buffer_read( $buff, 2, true ); // Read the first two characters
	// Check that the first two characters are 0xFF 0xDA  (SOI - Start of image)
	if ( $data != "\xFF\xD8" ) {
		// No SOI (FF D8) at start of file - This probably isn't a JPEG file - close file and return;
		return false;
	}
	$data = buffer_read( $buff, 2 ); // Read the third character
	// Check that the third character is 0xFF (Start of first segment header)
	if ( $data{0} != "\xFF" ) {
		// NO FF found - close file and return - JPEG is probably corrupted
		return false;
	}
	// Cycle through the file until, one of: 
	//   1) an EOI (End of image) marker is hit,
	//   2) we have hit the compressed image data (no more headers are allowed after data)
	//   3) or end of file is hit
	$hit_compressed_image_data = FALSE;
	while ( ( $data{1} != "\xD9" ) && ( !$hit_compressed_image_data) && ( $data != '' ) ) { 
		// Found a segment to look at.
		// Check that the segment marker is not a Restart marker - restart markers don't have size or data after them
		if (  ( ord($data{1}) < 0xD0 ) || ( ord($data{1}) > 0xD7 ) ) {
			// Segment isn't a Restart marker
			$sizestr = buffer_read( $buff, 2 ); // Read the next two bytes (size)
			$decodedsize = unpack ("nsize", $sizestr); // convert the size bytes to an integer
			// Read the segment data with length indicated by the previously read size
			$segdata = buffer_read( $buff, $decodedsize['size'] - 2 );
			// Store the segment information in the output array
			if ( !$want || $want == ord($data{1}) ) {
				$headerdata[] = (object)array(  
					"SegType" => ord($data{1}),
					"SegName" => $GLOBALS[ "JPEG_Segment_Names" ][ ord($data{1}) ],
					"SegDesc" => $GLOBALS[ "JPEG_Segment_Descriptions" ][ ord($data{1}) ],
					"SegData" => $segdata 
				);
			}
		}
		// If this is a SOS (Start Of Scan) segment, then there is no more header data - the compressed image data follows
		if ( $data{1} == "\xDA" ) {
			$hit_compressed_image_data = true;
		} else {
			// Not an SOS - Read the next two bytes - should be the segment marker for the next segment
			$data = buffer_read( $buff, 2 );
			// Check that the first byte of the two is 0xFF as it should be for a marker
			if ( $data{0} != "\xFF" ) {
				// NO FF found - close file and return - JPEG is probably corrupted
				return false;
			}
		}
	}
	return $headerdata;
}

function buffer_read( &$buff, $len, $new=false ) {
	static $pointer = 0;
	if ( $new )
		$pointer = 0;
	if ( $pointer + $len > strlen( $buff ) ) {
		$len = strlen( $buff ) - $pointer;
		if ( $len < 1 )
			return null;
	}
	$data = substr( $buff, $pointer, $len );
	$pointer += $len;
	return $data;
}

// The names of the JPEG segment markers, indexed by their marker number
$GLOBALS[ "JPEG_Segment_Names" ] = array(
	0xC0 =>  "SOF0",  0xC1 =>  "SOF1",  0xC2 =>  "SOF2",  0xC3 =>  "SOF4",
	0xC5 =>  "SOF5",  0xC6 =>  "SOF6",  0xC7 =>  "SOF7",  0xC8 =>  "JPG",
	0xC9 =>  "SOF9",  0xCA =>  "SOF10", 0xCB =>  "SOF11", 0xCD =>  "SOF13",
	0xCE =>  "SOF14", 0xCF =>  "SOF15",
	0xC4 =>  "DHT",   0xCC =>  "DAC",
	0xD0 =>  "RST0",  0xD1 =>  "RST1",  0xD2 =>  "RST2",  0xD3 =>  "RST3",
	0xD4 =>  "RST4",  0xD5 =>  "RST5",  0xD6 =>  "RST6",  0xD7 =>  "RST7",
	0xD8 =>  "SOI",   0xD9 =>  "EOI",   0xDA =>  "SOS",   0xDB =>  "DQT",
	0xDC =>  "DNL",   0xDD =>  "DRI",   0xDE =>  "DHP",   0xDF =>  "EXP",
	0xE0 =>  "APP0",  0xE1 =>  "APP1",  0xE2 =>  "APP2",  0xE3 =>  "APP3",
	0xE4 =>  "APP4",  0xE5 =>  "APP5",  0xE6 =>  "APP6",  0xE7 =>  "APP7",
	0xE8 =>  "APP8",  0xE9 =>  "APP9",  0xEA =>  "APP10", 0xEB =>  "APP11",
	0xEC =>  "APP12", 0xED =>  "APP13", 0xEE =>  "APP14", 0xEF =>  "APP15",
	0xF0 =>  "JPG0",  0xF1 =>  "JPG1",  0xF2 =>  "JPG2",  0xF3 =>  "JPG3",
	0xF4 =>  "JPG4",  0xF5 =>  "JPG5",  0xF6 =>  "JPG6",  0xF7 =>  "JPG7",
	0xF8 =>  "JPG8",  0xF9 =>  "JPG9",  0xFA =>  "JPG10", 0xFB =>  "JPG11",
	0xFC =>  "JPG12", 0xFD =>  "JPG13",
	0xFE =>  "COM",   0x01 =>  "TEM",   0x02 =>  "RES",
);


// The descriptions of the JPEG segment markers, indexed by their marker number
$GLOBALS[ "JPEG_Segment_Descriptions" ] = array(
	/* JIF Marker byte pairs in JPEG Interchange Format sequence */
	0xC0 => "Start Of Frame (SOF) Huffman  - Baseline DCT",
	0xC1 =>  "Start Of Frame (SOF) Huffman  - Extended sequential DCT",
	0xC2 =>  "Start Of Frame Huffman  - Progressive DCT (SOF2)",
	0xC3 =>  "Start Of Frame Huffman  - Spatial (sequential) lossless (SOF3)",
	0xC5 =>  "Start Of Frame Huffman  - Differential sequential DCT (SOF5)",
	0xC6 =>  "Start Of Frame Huffman  - Differential progressive DCT (SOF6)",
	0xC7 =>  "Start Of Frame Huffman  - Differential spatial (SOF7)",
	0xC8 =>  "Start Of Frame Arithmetic - Reserved for JPEG extensions (JPG)",
	0xC9 =>  "Start Of Frame Arithmetic - Extended sequential DCT (SOF9)",
	0xCA =>  "Start Of Frame Arithmetic - Progressive DCT (SOF10)",
	0xCB =>  "Start Of Frame Arithmetic - Spatial (sequential) lossless (SOF11)",
	0xCD =>  "Start Of Frame Arithmetic - Differential sequential DCT (SOF13)",
	0xCE =>  "Start Of Frame Arithmetic - Differential progressive DCT (SOF14)",
	0xCF =>  "Start Of Frame Arithmetic - Differential spatial (SOF15)",
	0xC4 =>  "Define Huffman Table(s) (DHT)",
	0xCC =>  "Define Arithmetic coding conditioning(s) (DAC)",
	0xD0 =>  "Restart with modulo 8 count 0 (RST0)",
	0xD1 =>  "Restart with modulo 8 count 1 (RST1)",
	0xD2 =>  "Restart with modulo 8 count 2 (RST2)",
	0xD3 =>  "Restart with modulo 8 count 3 (RST3)",
	0xD4 =>  "Restart with modulo 8 count 4 (RST4)",
	0xD5 =>  "Restart with modulo 8 count 5 (RST5)",
	0xD6 =>  "Restart with modulo 8 count 6 (RST6)",
	0xD7 =>  "Restart with modulo 8 count 7 (RST7)",
	0xD8 =>  "Start of Image (SOI)",
	0xD9 =>  "End of Image (EOI)",
	0xDA =>  "Start of Scan (SOS)",
	0xDB =>  "Define quantization Table(s) (DQT)",
	0xDC =>  "Define Number of Lines (DNL)",
	0xDD =>  "Define Restart Interval (DRI)",
	0xDE =>  "Define Hierarchical progression (DHP)",
	0xDF =>  "Expand Reference Component(s) (EXP)",
	0xE0 =>  "Application Field 0 (APP0) - usually JFIF or JFXX",
	0xE1 =>  "Application Field 1 (APP1) - usually EXIF or XMP/RDF",
	0xE2 =>  "Application Field 2 (APP2) - usually Flashpix",
	0xE3 =>  "Application Field 3 (APP3)",
	0xE4 =>  "Application Field 4 (APP4)",
	0xE5 =>  "Application Field 5 (APP5)",
	0xE6 =>  "Application Field 6 (APP6)",
	0xE7 =>  "Application Field 7 (APP7)",
	0xE8 =>  "Application Field 8 (APP8)",
	0xE9 =>  "Application Field 9 (APP9)",
	0xEA =>  "Application Field 10 (APP10)",
	0xEB =>  "Application Field 11 (APP11)",
	0xEC =>  "Application Field 12 (APP12) - usually [picture info]",
	0xED =>  "Application Field 13 (APP13) - usually photoshop IRB / IPTC",
	0xEE =>  "Application Field 14 (APP14)",
	0xEF =>  "Application Field 15 (APP15)",
	0xF0 =>  "Reserved for JPEG extensions (JPG0)",
	0xF1 =>  "Reserved for JPEG extensions (JPG1)",
	0xF2 =>  "Reserved for JPEG extensions (JPG2)",
	0xF3 =>  "Reserved for JPEG extensions (JPG3)",
	0xF4 =>  "Reserved for JPEG extensions (JPG4)",
	0xF5 =>  "Reserved for JPEG extensions (JPG5)",
	0xF6 =>  "Reserved for JPEG extensions (JPG6)",
	0xF7 =>  "Reserved for JPEG extensions (JPG7)",
	0xF8 =>  "Reserved for JPEG extensions (JPG8)",
	0xF9 =>  "Reserved for JPEG extensions (JPG9)",
	0xFA =>  "Reserved for JPEG extensions (JPG10)",
	0xFB =>  "Reserved for JPEG extensions (JPG11)",
	0xFC =>  "Reserved for JPEG extensions (JPG12)",
	0xFD =>  "Reserved for JPEG extensions (JPG13)",
	0xFE =>  "Comment (COM)",
	0x01 =>  "For temp private use arith code (TEM)",
	0x02 =>  "Reserved (RES)",
);

/**
 * Most of this function is taken directly from the source code for imagemagick
 * and how it handles identify -verbose $filename to present you with a Quality
 * number.  This number should be considered approximate.  It's essentially
 * based upon the numbers used to perform compression on the original image
 * data... 
 *
 * See: http://www.obrador.com/essentialjpeg/headerinfo.htm
 * See: http://www.impulseadventure.com/photo/jpeg-quantization.html
 * See: http://www.impulseadventure.com/photo/jpeg-huffman-coding.html
 */
function get_jpeg_quality( &$buff ) {
	$tables = array(
		'multi' => array(
			'hash' => array(
				 1020, 1015,  932,  848,  780,  735,  702,  679,  660,  645,
				  632,  623,  613,  607,  600,  594,  589,  585,  581,  571,
				  555,  542,  529,  514,  494,  474,  457,  439,  424,  410,
				  397,  386,  373,  364,  351,  341,  334,  324,  317,  309,
				  299,  294,  287,  279,  274,  267,  262,  257,  251,  247,
				  243,  237,  232,  227,  222,  217,  213,  207,  202,  198,
				  192,  188,  183,  177,  173,  168,  163,  157,  153,  148,
				  143,  139,  132,  128,  125,  119,  115,  108,  104,   99,
				   94,   90,   84,   79,   74,   70,   64,   59,   55,   49,
				   45,   40,   34,   30,   25,   20,   15,   11,    6,    4,
					0	
			), // hash
			'sums' => array (
				 32640, 32635, 32266, 31495, 30665, 29804, 29146, 28599, 28104,
				 27670, 27225, 26725, 26210, 25716, 25240, 24789, 24373, 23946,
				 23572, 22846, 21801, 20842, 19949, 19121, 18386, 17651, 16998,
				 16349, 15800, 15247, 14783, 14321, 13859, 13535, 13081, 12702,
				 12423, 12056, 11779, 11513, 11135, 10955, 10676, 10392, 10208,
				  9928,  9747,  9564,  9369,  9193,  9017,  8822,  8639,  8458,
				  8270,  8084,  7896,  7710,  7527,  7347,  7156,  6977,  6788,
				  6607,  6422,  6236,  6054,  5867,  5684,  5495,  5305,  5128,
				  4945,  4751,  4638,  4442,  4248,  4065,  3888,  3698,  3509,
				  3326,  3139,  2957,  2775,  2586,  2405,  2216,  2037,  1846,
				  1666,  1483,  1297,  1109,   927,   735,   554,   375,   201,
				   128,     0
			 ), // sums
		), // multi
		'single' => array(
			'hash' => array(
               510,  505,  422,  380,  355,  338,  326,  318,  311,  305,
               300,  297,  293,  291,  288,  286,  284,  283,  281,  280,
               279,  278,  277,  273,  262,  251,  243,  233,  225,  218,
               211,  205,  198,  193,  186,  181,  177,  172,  168,  164,
               158,  156,  152,  148,  145,  142,  139,  136,  133,  131,
               129,  126,  123,  120,  118,  115,  113,  110,  107,  105,
               102,  100,   97,   94,   92,   89,   87,   83,   81,   79,
                76,   74,   70,   68,   66,   63,   61,   57,   55,   52,
                50,   48,   44,   42,   39,   37,   34,   31,   29,   26,
                24,   21,   18,   16,   13,   11,    8,    6,    3,    2,
                 0
			), // hash
			'sums' => array(
               16320, 16315, 15946, 15277, 14655, 14073, 13623, 13230, 12859,
               12560, 12240, 11861, 11456, 11081, 10714, 10360, 10027,  9679,
                9368,  9056,  8680,  8331,  7995,  7668,  7376,  7084,  6823,
                6562,  6345,  6125,  5939,  5756,  5571,  5421,  5240,  5086,
                4976,  4829,  4719,  4616,  4463,  4393,  4280,  4166,  4092,
                3980,  3909,  3835,  3755,  3688,  3621,  3541,  3467,  3396,
                3323,  3247,  3170,  3096,  3021,  2952,  2874,  2804,  2727,
                2657,  2583,  2509,  2437,  2362,  2290,  2211,  2136,  2068,
                1996,  1915,  1858,  1773,  1692,  1620,  1552,  1477,  1398,
                1326,  1251,  1179,  1109,  1031,   961,   884,   814,   736,
                 667,   592,   518,   441,   369,   292,   221,   151,    86,
                  64,     0
			), // sums
		), // single
	); // tables
	
	$headers = get_jpeg_header_data( $buff, 0xDB );
	if ( !is_array( $headers ) || !count( $headers ) )
		return 100;
	$header = $headers[0];	
	$quality = 0;
	if ( strlen($header->SegData) > 128 ) {
		$entry = array( 0 => array(), 1 => array() );
		foreach ( str_split( substr( $header->SegData, 1, 64) ) as $chr ) 
			$entry[0][] = ord($chr);
		foreach ( str_split( substr( $header->SegData, -64) ) as $chr )
			$entry[1][] = ord($chr);
		$sum = array_sum( $entry[0] ) + array_sum( $entry[1] );
		$qvalue = $entry[0][2] + $entry[0][53] + $entry[1][0] + $entry[1][63];
		$table = "multi";
	} else if ( strlen($header->SegData) > 64 ) {
		$entry = array( 0 => array() );
		foreach ( str_split( substr( $header->SegData, 1, 64) ) as $chr )
			$entry[0][] = ord($chr);
		$sum = array_sum( $entry[0] );
		$qvalue = $entry[0][2] + $entry[0][53];
		$table = "single";
	} else {
		return 100; // go with a safe value
	}
	for( $i = 0; $i < 100; $i++ ) {
		if ( ( $qvalue < $tables[$table]['hash'][$i] ) && ( $sum < $tables[$table]['sums'][$i] ) )
			continue;
		// error_log( "$table >> $qvalue && $sum >> " . ( $i + 1 ) );
		return $i + 1;
	}
	return 100; // go with a safe value
}

Right… serious suckage. Which is why I’m sharing it here so that you don’t have to go through all that trouble. You can just steal my stolen code. Aren’t GPL compatible licenses fun?

Ok, finally, you… in the back… stop jumping up and down screaming about the Imagick PECL extension… In my testing getCompressionQuality() didnt work and getImageCompression() was unavailable (though I hear there is a newer version of the extension now… YMMV)

php open locking daemon

Don’t you hate that… When it’s 2:00am… and you really should be in bed… But your mind has hold of a problem, and wont let it go. I have a project where it would be really handy for a process to be able to lock (arbitrary string identifier) and for another process to be able to check whether (arbitrary string identifier) is still locked. And the processes that do the locking can die… so the lock really needs to expire when they do. I could use MySQLs get_lock but I’m already abusing the hell out of that for more distributed things (and since you cannot have more than one mysql named lock at a time per connection, i don’t think it would work here…) in the originating processes, and these locks are machine wide, not network wide…

I don’t like flock because you have to actually create a file to try and lock it leaving race conditions and the possibility of orphaned files on the file system which just sucks… I thought about Memcached but I really need something that can be held open for long periods of inactivity and released if the client dies which precludes the infinite and the timed method of memcached value storage…

After some searching I found old — Open Lock Daemon which looked like a super good fit… Until I dug into the communication protocol… What a nightmare for wanting to lock a string… srsly. So not being able to find anything (and apparently not being able to sleep until I had a satisfactory answer) I decided to write one. In PHP, naturally. Weighing in at 180 lines I think it’s a pretty acceptable/workable first pass.

[ edit: code available here ]

Continue reading →

Just what you need to know to write a CouchDB reduce function

Lets say you have the CouchDB classes (located here) all compiled together and included into your test.php script. Lets also say that you have created a database with the built-in web ui called “testing”. Finally let us say that your test.php has the following code in it, which would add a record to the db every time it is run. (i know that the data in the document serves no useful purpose… but really I just want to figure out this map/reduce thing so that I can make awesome views… so this suffices sufficiently.)

require_once dirname( dirname( __FILE__ ) ) . '/includes/couchdb.php';
$couchdb = new CouchDB('testing', 'localhost', 5984);
$key = microtime();
$result = $couchdb->send(
    '/'.md5($key),
    'put',
    json_encode(
        array(
            "_id" => md5($key),
            "time" => $key,
            'md5' => md5($key),
            'sha1' => sha1($key),
            'crc' => crc32($key)
        )
    )
);
print_r($result->getBody(true));

After running the code a bunch of times you would end up with a bunch of documents which look more or less like this:

picture-1 (click for full size)

Now lets say you want to write a view that told you what the first characters of the _id were and how many documents share that first letter. This is analogous to the following in MySQL

SELECT LEFT(md5, 1) AS `lchar`, count(md5) FROM `md5table` GROUP BY `lchar`

Your map function is easy, because you dont have any selection criteria, so we process all rows

function(doc){ emit(doc._id,doc); }

The reduce function is where the actual programming comes in… And it seems there aren’t many well explained examples of exactly how to do this (I just brute forced it by trial and error)

function(key, values, rereduce) { 
    var output = {};
    if ( rereduce ) { 
        // key is null, and values are values returned by previous calls
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// essentially we are taking the previously reduced view, and the 
	// reduced view for new records, and we are reducing those two things
	// together.  Summarizing two summaries, essentially
        for ( var i in values ) {
	    // here we have multiple prebuilt output objects and we're simply combining them
  	    // just like below we have an array with a numeric id and an output object
	    // 
	    // retrieve a summary
            var vals = values[i];
            for ( var key in vals ) {
		// debugging
                // log(key);
		// 
		// store in or increment our new output object 
                if ( output[key] == undefined )
                    output[key] = vals[key];
                else
                    output[key] = output[key] + vals[key];
            }
        }
    } else {
        // key is an array, which we dont care about, and values are the 
	// values returned by the map
	//
	// see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
	//
	// we are taking each document and processing that, reducing it down
	// to a summary object (output) for each of the rows passed
        for ( var i in values ) {
	    // we have an array, values, with numeric ids and a document objects
	    //
	    // retrieve a document
            var doc = values[i];
	    // get what we want from it, the first char of the md5
            var key = doc._id.substr(0, 1);
	    // debugging
            // log( key + " :: " + doc._id );
	    //
	    // store or increment the output object
            if ( output[key] !== undefined )
                output[key] = output[key] + 1;
            else
                output[key] = 1;
        }
    }
    // done
    return output;
}

and in code, using a temporary view, ( if you used this view all the time you would want to make it permanent… but this is about how to lay out a reduce function, nothing more ) so request code that looks like this

$view = array(
    'map' => 'function(doc){ emit(doc._id,doc); }',
    'reduce' => '
        function(key, values, rereduce) { 
            var output = {};
            if ( rereduce ) { 
                // key is null, and values are values returned by previous calls
                for ( var i in values ) {
                    var vals = values[i];
                    for ( var key in vals ) {
                        // log(key);
                        if ( output[key] == undefined )
                            output[key] = vals[key];
                        else
                            output[key] = output[key] + vals[key];
                    }
                }
            } else {
                // key is an array, which we dont care about, and values are the values returneb by the map
                for ( var i in values ) {
                    var doc = values[i];
                    var key = doc._id.substr(0, 1);
                    // log( key + " :: " + doc._id );
                    if ( output[key] !== undefined )
                        output[key] = output[key] + 1;
                    else
                        output[key] = 1;
                }
            }
            return output;
        }
    '
    );
$result = $couchdb->send('/_temp_view', 'POST', json_encode($view) );
print_r($result->getBody(true));

would give you output that looks like this:

stdClass Object
(
    [rows] => Array
        (
            [0] => stdClass Object
                (
                    [key] => 
                    [value] => stdClass Object
                        (
                            [0] => 15
                            [1] => 17
                            [2] => 16
                            [3] => 13
                            [4] => 27
                            [5] => 18
                            [6] => 26
                            [7] => 15
                            [8] => 18
                            [9] => 21
                            [a] => 12
                            [b] => 23
                            [c] => 20
                            [d] => 27
                            [e] => 28
                            [f] => 26
                        )

                )

        )

)

I hope this helps somebody out.

Debian Lenny, Avahi, AFP… Linux Fileserver for OSX Clients

If you’re like me you have an OSX computer or 3 at home, and a debian file server. If you’re like me you hate samba/nfs on principle and want your debian server to show up in finder.Â If you’re like me you arent using debian 3 which is what most of the walkthroughs seem to expect…Â This is how I did it… With Debian Lenny.

What we’re using, and why:

Avahi handles zeroconf (making it show up in finder) (most howtos involve howl which is no longer in apt)
netatalk has afpd
afpd is the fileserver

From: http://blog.damontimm.com/how-to-install-netatalk-afp-on-ubuntu-with-encrypted-authentication/

apt-get update
mkdir -p ~/src/netatalk
cd ~/src/netatalk
apt-get install cracklib2-dev libssl-dev
apt-get source netatalk
apt-get build-dep netatalk
cd netatalk-2.0.3

From: http://www.sharedknowhow.com/2008/05/installing-netatalk-under-centos-5-with-leopard-support/

vim bin/cnid/cnid_index.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 277
vim etc/cnid_dbd/dbif.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 517

Mine

./configure –prefix=/usr/local/netatalk
make
make install
vim /etc/rc.local ## add “/usr/local/netatalk/sbin/afpd”
/usr/local/netatalk/sbin/afpd

From: http://www.disgruntled-dutch.com/2007/general/how-to-get-your-linux-based-afp-server-to-show-up-correctly-in-leopards-new-finder

apt-get install avahi-daemon
vim /etc/nsswitch.conf ## make the hosts line read “hosts: files dns mdns4”
cd /etc/avahi/services
wget http://www.disgruntled-dutch.com/media/afpd.service
/etc/init.d/avahi-daemon restart

in case that file drops off the face of the net, this is its contents (except “< ?” is “<?” and “< !” is “<!”) :

< ?xml version="1.0" standalone='no'?>
< !DOCTYPE service-group SYSTEM "avahi-service.dtd">

%h

_afpovertcp._tcp
548

At this point your server should show up under the network in your finder… and you should be able to connect with any system username/pw combo

making munin-graph take advantage of multiple cpus/cores

I do a lot of things for Automattic, and many of the things I do are quite esoteric (for a php developer anyways.) Perl is not my language of choice, but I’ve never balked at a challenge…. just… did it have to be perl? Anyways. We have more than a thousand machines that we track with munin… which means a TON of graphs. munin-update is efficient, taking advantage of all cpus and getting done in the fastest time possible, but munin-graph started taking so long as to be useless (and munin-cgi-graph takes almost a minute to fully render the servers day/week summary page which is completely unacceptable when we’re trying to troubleshoot a sudden, urgent, problem.) So I got to dive in and make it faster…

Step 1: add in this function (which i borrowed from somewhere else)

sub afork (\@$&) {
  my ($data, $max, $code) = @_;
  my $c = 0;
  foreach my $data (@$data) {
    wait unless ++ $c < = $max;
    die "Fork failed: $!
" unless defined (my $pid = fork);
    exit $code -> ($data) unless $pid;
  }
  1 until -1 == wait;
}

Step 2: replace this

for my $service (@$work_array) {
    process_service ($service);
}

with this

afork(@$work_array, 16, \&process_service);

I also have munin-html and munin-graph running side-by-side

( [ -x /usr/local/munin/lib/munin-graph  ] &&
    nice /usr/local/munin/lib/munin-graph --cron $@ 2>&1 |
    fgrep -v "*** attempt to put segment in horiz list twice" )& $waitgraph=$!
( [ -x /usr/local/munin/lib/munin-html   ] && nice /usr/local/munin/lib/munin-html $@; )& $waithtml=$!
wait $waitgraph
wait $waithtml

I did several other, more complicated hacks as well. Such as not generating month and year graphs via cron, letting those render on-demand with munin-cgi-graph

All said we’re doing in under 2.5 minutes what was taking 7 or 8 minutes previously

Using wait, $!, and () for threading in bash

This is a simplistic use of the pattern that I wrote about in my last post to wait on multiple commands in bash. In essence I have a script which runs a command (like uptime or restarting a daemon) on a whole bunch of servers (think pssh). Anyways… this is how I modified the script to run the command on multiple hosts in parallel. This is a bit simplistic as it runs, say, 10 parallel ssh commands and then waits for all 10 to complete. I’m very confident that someone could easily adapt this to run at a constant concurrency level of $threads… but I didn’t need it just then so I didn’t go that far… As a side note, this is possibly the first time I’ve ever *needed* an array in a bash script… hah…

# $1 is the commandto run on the remote hosts
# $2 is used for something not important for this script
# $3 is the (optional) number of concurrent connections to use

if [ ! "$3" == "" ]
then
    threads=$3
else
    threads=1
fi

cthreads=0;
stack=()
for s in $servers
  do
    if [ $cthreads -eq $threads ]; then
        for job in ${stack[@]}; do
              wait $job
        done
        stack=()
        cthreads=0
    fi
    (
        for i in $(ssh root@$s "$1" )
            do
                echo -e "$s:\t$i"
        done
    )& stack[$cthreads]=$!
    let cthreads=$cthreads+1
done
for job in ${stack[@]}; do
    wait $job
done

bash – collecting the return value of backgrounded processes

You know that you can run something in the background in a bash script with ( command )&, but a coworker recently wanted to run multiple commands, wait for all of them to complete, collect and decide what to do based on their return values… this proved much trickier. Luckily there is an answer

#!/bin/bash

(sleep 3; exit 1)& p1=$!
(sleep 2; exit 2)& p2=$!
(sleep 1; exit 3)& p3=$!

wait "$p1"; r1=$?
wait "$p2"; r2=$?
wait "$p3"; r3=$?

echo "$p1:$r1 $p2:$r2 $p3:$r3"

CodeWord: Apokalyptik

The random things that spew forth from my brain…

CLI