pt.tumba.parser
Class RabinHashFunction

java.lang.Object
  extended by pt.tumba.parser.RabinHashFunction
All Implemented Interfaces:
java.io.Serializable

public final class RabinHashFunction
extends java.lang.Object
implements java.io.Serializable

We compute the checksum using Broder s implementation of Rabin s fingerprinting algorithm. Fingerprints offer provably strong probabilistic guarantees that two different strings will not have the same fingerprint. Other checksum algorithms, such as MD5 and SHA, do not offer such provable guarantees, and are also more expensive to compute than Rabin fingerprint. A disadvantage is that these faster functions are efficiently invertible (that is, one can easily build an URL that hashes to a particular location), a fact that might be used by malicious users to nefarious purposes. Using the Rabin's fingerprinting function, the probability of collision of two strings s1 and s2 can be bounded (in a adversarial model for s1 and s2) by max(|s1|,|s2|)/2**(l-1), where |s1| is the length of the string s1 in bits. The advantage of choosing Rabin fingerprints (which are based on random irreducible polynomials) rather than some arbitrary hash function is that their probability of collision os well understood. Furthermore Rabin fingerprints can be computed very efficiently in software and we can take advantage of their algebraic properties when we compute the fingerprints of "sliding windows". M. O. Rabin Fingerprinting by random polynomials. Center for Research in Computing Technology Harvard University Report TR-15-81 1981 A. Z. Broder Some applications of Rabin's fingerprinting method In R.Capicelli, A. De Santis and U. Vaccaro editors Sequences II:Methods in Communications, Security, and Computer Science pages 143-152 Springer-Verlag 1993

See Also:
Serialized Form

Field Summary
private  byte[] buffer
           
private static int P_DEGREE
           
private  long POLY
           
private static int READ_BUFFER_SIZE
           
private  long[] table32
           
private  long[] table40
           
private  long[] table48
           
private  long[] table54
           
private  long[] table62
           
private  long[] table70
           
private  long[] table78
           
private  long[] table84
           
private static int X_P_DEGREE
           
 
Constructor Summary
RabinHashFunction()
          Constructor for the RabinHashFunction64 object
 
Method Summary
 long hash(byte[] A)
          Return the Rabin hash value of an array of bytes.
private  long hash(byte[] A, int offset, int length, long ws)
          Description of the Method
 long hash(char[] A)
          Return the Rabin hash value of an array of chars.
 long hash(java.io.File f)
          Computes the Rabin hash value of the contents of a file.
 long hash(java.io.InputStream is)
          Computes the Rabin hash value of the data from an InputStream.
 long hash(int[] A)
          Returns the Rabin hash value of an array of integers.
 long hash(long[] A)
          Returns the Rabin hash value of an array of longs.
 long hash(java.lang.Object obj)
          Description of the Method
 long hash(java.io.Serializable obj)
          Returns the Rabin hash value of a serializable object.
 long hash(java.lang.String s)
          Computes the Rabin hash value of a String.
 long hash(java.net.URL url)
          Computes the Rabin hash value of the contents of a file, specified by URL.
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

P_DEGREE

private static final int P_DEGREE
See Also:
Constant Field Values

READ_BUFFER_SIZE

private static final int READ_BUFFER_SIZE
See Also:
Constant Field Values

X_P_DEGREE

private static final int X_P_DEGREE
See Also:
Constant Field Values

buffer

private final byte[] buffer

POLY

private long POLY

table32

private final long[] table32

table40

private final long[] table40

table48

private final long[] table48

table54

private final long[] table54

table62

private final long[] table62

table70

private final long[] table70

table78

private final long[] table78

table84

private final long[] table84
Constructor Detail

RabinHashFunction

public RabinHashFunction()
Constructor for the RabinHashFunction64 object

Parameters:
P - Description of the Parameter
Method Detail

main

public static void main(java.lang.String[] args)

hash

public long hash(byte[] A)
Return the Rabin hash value of an array of bytes.

Parameters:
A - the array of bytes
Returns:
the hash value

hash

private long hash(byte[] A,
                  int offset,
                  int length,
                  long ws)
Description of the Method

Parameters:
A - Description of the Parameter
offset - Description of the Parameter
length - Description of the Parameter
w - Description of the Parameter
Returns:
Description of the Return Value

hash

public long hash(char[] A)
Return the Rabin hash value of an array of chars.

Parameters:
A - the array of chars
Returns:
the hash value

hash

public long hash(java.io.File f)
          throws java.io.FileNotFoundException,
                 java.io.IOException
Computes the Rabin hash value of the contents of a file.

Parameters:
f - the file to be hashed
Returns:
the hash value of the file
Throws:
java.io.FileNotFoundException - if the file cannot be found
java.io.IOException - if an error occurs while reading the file

hash

public long hash(java.io.InputStream is)
          throws java.io.IOException
Computes the Rabin hash value of the data from an InputStream.

Parameters:
is - the InputStream to hash
Returns:
the hash value of the data from the InputStream
Throws:
java.io.IOException - if an error occurs while reading from the InputStream

hash

public long hash(int[] A)
Returns the Rabin hash value of an array of integers. This method is the most efficient of all the hash methods, so it should be used when possible.

Parameters:
A - array of integers
Returns:
the hash value

hash

public long hash(long[] A)
Returns the Rabin hash value of an array of longs. This method is the most efficient of all the hash methods, so it should be used when possible.

Parameters:
A - array of integers
Returns:
the hash value

hash

public long hash(java.lang.Object obj)
          throws java.io.IOException
Description of the Method

Parameters:
obj - Description of the Parameter
Returns:
Description of the Return Value
Throws:
java.io.IOException - Description of the Exception

hash

public long hash(java.io.Serializable obj)
          throws java.io.IOException
Returns the Rabin hash value of a serializable object.

Parameters:
obj - the object to be hashed
Returns:
the hash value
Throws:
java.io.IOException - if serialization fails

hash

public long hash(java.lang.String s)
Computes the Rabin hash value of a String.

Parameters:
s - the string to be hashed
Returns:
the hash value

hash

public long hash(java.net.URL url)
          throws java.io.IOException
Computes the Rabin hash value of the contents of a file, specified by URL.

Parameters:
url - the URL of the file to be hashed
Returns:
the hash value of the file
Throws:
java.io.IOException - if an error occurs while reading from the URL