A counting bloom filter

This is just me screwing around, really…. This implements a counting bloom filter in native php. I read about bloom filters this morning, and wrote this tonight in between playing Command and Conquer 3 campaign missions. The hashing function is configurable (and easily extensible,) rehashing the key multiple times to reduce the chance of collisions is also configurable (though I’m not entirely sure how needed this is.) Frankly this all might not even be (or work) properly… seems to… we’ll see…

< ?php

class bloom {

	/**
	 *
	 * A counting bloom filter
	 *
	 *	$f = new bloom(10, 'md5');
	 *
	 *	$f->add('foo');
	 *	$f->add('bar');
	 *	$f->add('foo');
	 *
	 *	$f->exists('foo');        // true
	 *	$f->exists('bar');        // true
	 *	$f->exists('baz');        // false
	 *	$f->exists('foo', false); // 2
	 *	$f->exists('bar', false); // 1
	 *	$f->exists('baz', false); // false
	 *
	 **/

	var $number_of_hashing_functions;
	var $map = array();
	var $hashlen = 32;
	var $hashfunc = 'md5';

	function __construct( $functions=1, $hashtype='md5' ) {
		$this->bloom($functions, $hashtype);
	}

	function bloom( $functions=1, $hashtype='md5' ) {
		$this->number_of_hashing_functions = (int)$functions;
		if ( !$this->number_of_hashing_functions || $this->number_of_hashing_functions < 1 )
			$this->number_of_hashing_functions = 1;
		$this->set_hashing_function($hashtype);
		$this->initialize_bitmat($this->number_of_hashing_functions);
	}

	function set_hashing_function($method) {
		switch ( $method ) {
			default:
			case 'md5':
				return false;
				break;
			case 'sha1':
				$this->hashlen = 40;
				$this->hashfunc = 'sha1';
				break;
		}
	}

	function hash($key, $n=0) {
		return call_user_func( $this->hashfunc, $n.$key );
	}

	function add($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]++;
			}
		}
		return true;
	}

	function remove($key) {
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				$this->map[$i][$n][$k{$n}]--;
			}
		}
		return true;
	}

	function exists($key, $bool=true) {
		$max = 0;
		for ( $i=0; $i< $this->number_of_hashing_functions; $i++ ) {
			$k = $this->hash($key, $i);
			for ( $n=0; $n< $this->hashlen; $n++ ) {
				if ( !$v = $this->map[$i][$n][$k{$n}] )
					return false;
				else
					$max = max($v, $max);
			}
		}
		if ( $bool )
			return true;
		else
			return $max;
	}

	function initialize_bitmat($n) {
		$empty_bitmap_line = array(
			'0' => 0, '1' => 0, '2' => 0, '3' => 0, '4' => 0, '5' => 0, '6' => 0, '7' => 0,
			'8' => 0, '9' => 0, 'a' => 0, 'b' => 0, 'c' => 0, 'd' => 0, 'e' => 0, 'f' => 0  );
		$this->map=array();
		for ( $i=0; $i< $n; $i++ ) {
			$this->map[$i]=array();
			for ( $l=0; $l< $this->hashlen; $l++ ) {
				$this->map[$i][$l] = $empty_bitmap_line;
			}
		}
	}

}

?>

Comments (2)

  1. Andy Skelton wrote::

    Nice! The Wikipedia entry has no link to a php implementation. This should be it.

    I'm going to see if I can use the erlang implementation to speed up user lookups in ejabberd.

    Saturday, December 27, 2008 at 10:50 PM #
  2. shadi shagareen wrote::

    Dear sir

    can i get this code in c languge please

    Monday, July 13, 2009 at 5:42 PM #

Trackback/Pingback (1)

  1. A counting bloom filter | PHP-Blog.com on Sunday, December 28, 2008 at 8:24 AM

    [...] more here: A counting bloom filter Related ArticlesBookmarksTags Release history 1.0 1.0.0 1995-06-08 Officially called [...]