Archive for December 28th, 2008

A counting bloom filter


28 Dec

This is just me screwing around, really…. This implements a counting bloom filter in native php. I read about bloom filters this morning, and wrote this tonight in between playing Command and Conquer 3 campaign missions. The hashing function is configurable (and easily extensible,) rehashing the key multiple times to reduce the chance of collisions is also configurable (though I’m not entirely sure how needed this is.) Frankly this all might not even be (or work) properly… seems to… we’ll see…

< ?php
 
class bloom {
 
	/**
	 *
	 * A counting bloom filter
	 *
	 *	$f = new bloom(10, 'md5');
	 *
	 *	$f->add('foo');
	 *	$f->add('bar');
	 *	$f->add('foo');
	 *
	 *	$f->exists('foo');        // true
	 *	$f->exists('bar');        // true
	 *	$f->exists('baz');        // false
	 *	$f->exists('foo', false); // 2
	 *	$f->exists('bar', false); // 1
	 *	$f->exists('baz', false); // false
	 *
	 **/
 
	var $number_of_hashing_functions;
	var $map = array();
	var $hashlen = 32;
	var $hashfunc = 'md5';
 
	function __construct( $functions=1, $hashtype='md5' ) {
		$this->bloom($functions, $hashtype);
	}
 
	function bloom( $functions=1, $hashtype='md5' ) {
		$this->number_of_hashing_functions = (int)$functions;
		if ( !$this->number_of_hashing_functions || $this->number_of_hashing_functions < 1 )
			$this->number_of_hashing_functions = 1;
		$this->set_hashing_function($hashtype);
		$this->initialize_bitmat($this->number_of_hashing_functions);
	}
 
	function set_hashing_function($method) {
		switch ( $method ) {
			default:
			case 'md5':
				return false;
				break;
			case 'sha1':
				$this->hashlen = 40;
				$this->hashfunc = 'sha1';
				break;
		}
	}
 
	function hash($key, $n=0) {
		return call_user_func( $this->hashfunc, $n.$key );
	}
 
	function add($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]++;
			}
		}
		return true;
	}
 
	function remove($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]--;
			}
		}
		return true;
	}
 
	function exists($key, $bool=true) {
		$max = 0;
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				if ( !$v = $this->map[$i][$n][$k{$n}] )
					return false;
				else
					$max = max($v, $max);
			}
		}
		if ( $bool )
			return true;
		else
			return $max;
	}
 
	function initialize_bitmat($n) {
		$empty_bitmap_line = array(
			'0' => 0, '1' => 0, '2' => 0, '3' => 0, '4' => 0, '5' => 0, '6' => 0, '7' => 0,
			'8' => 0, '9' => 0, 'a' => 0, 'b' => 0, 'c' => 0, 'd' => 0, 'e' => 0, 'f' => 0  );
		$this->map=array();
		for ( $i=0; $i< $n; $i++ ) {
			$this->map[$i]=array();
			for ( $l=0; $l< $this->hashlen; $l++ ) {
				$this->map[$i][$l] = $empty_bitmap_line;
			}
		}
	}
 
}
 
?>

CodeWord: Apokalyptik

The random things that spew forth from my brain…