A counting bloom filter

This is just me screwing around, really…. This implements a counting bloom filter in native php. I read about bloom filters this morning, and wrote this tonight in between playing Command and Conquer 3 campaign missions. The hashing function is configurable (and easily extensible,) rehashing the key multiple times to reduce the chance of collisions is also configurable (though I’m not entirely sure how needed this is.) Frankly this all might not even be (or work) properly… seems to… we’ll see…

< ?php

class bloom {

	/**
	 *
	 * A counting bloom filter
	 *
	 *	$f = new bloom(10, 'md5');
	 *
	 *	$f->add('foo');
	 *	$f->add('bar');
	 *	$f->add('foo');
	 *
	 *	$f->exists('foo');        // true
	 *	$f->exists('bar');        // true
	 *	$f->exists('baz');        // false
	 *	$f->exists('foo', false); // 2
	 *	$f->exists('bar', false); // 1
	 *	$f->exists('baz', false); // false
	 *
	 **/

	var $number_of_hashing_functions;
	var $map = array();
	var $hashlen = 32;
	var $hashfunc = 'md5';

	function __construct( $functions=1, $hashtype='md5' ) {
		$this->bloom($functions, $hashtype);
	}

	function bloom( $functions=1, $hashtype='md5' ) {
		$this->number_of_hashing_functions = (int)$functions;
		if ( !$this->number_of_hashing_functions || $this->number_of_hashing_functions < 1 )
			$this->number_of_hashing_functions = 1;
		$this->set_hashing_function($hashtype);
		$this->initialize_bitmat($this->number_of_hashing_functions);
	}

	function set_hashing_function($method) {
		switch ( $method ) {
			default:
			case 'md5':
				return false;
				break;
			case 'sha1':
				$this->hashlen = 40;
				$this->hashfunc = 'sha1';
				break;
		}
	}

	function hash($key, $n=0) {
		return call_user_func( $this->hashfunc, $n.$key );
	}

	function add($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]++;
			}
		}
		return true;
	}

	function remove($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]--;
			}
		}
		return true;
	}

	function exists($key, $bool=true) {
		$max = 0;
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				if ( !$v = $this->map[$i][$n][$k{$n}] )
					return false;
				else
					$max = max($v, $max);
			}
		}
		if ( $bool )
			return true;
		else
			return $max;
	}

	function initialize_bitmat($n) {
		$empty_bitmap_line = array(
			'0' => 0, '1' => 0, '2' => 0, '3' => 0, '4' => 0, '5' => 0, '6' => 0, '7' => 0,
			'8' => 0, '9' => 0, 'a' => 0, 'b' => 0, 'c' => 0, 'd' => 0, 'e' => 0, 'f' => 0  );
		$this->map=array();
		for ( $i=0; $i< $n; $i++ ) {
			$this->map[$i]=array();
			for ( $l=0; $l< $this->hashlen; $l++ ) {
				$this->map[$i][$l] = $empty_bitmap_line;
			}
		}
	}

}

?>

3 thoughts on “A counting bloom filter

  1. Nice! The Wikipedia entry has no link to a php implementation. This should be it.

    I'm going to see if I can use the erlang implementation to speed up user lookups in ejabberd.

Leave a Reply