Hashing

Searching for a key. The story so far...

  • Sequential search:
  • Binary search:
  • Search in a self-balancing BST:
  • Can we do better???

Hashing

  • If the keys are unique array index values, then search is trivially .
  • Hash table is a data structure that uses a hash function to map from keys to array index values.

Hash Functions

  • Hash function should:

    1. be fast to compute
    2. distribute keys as evenly as possible, even if the original keys are not well distributed.
    3. Result in a number from between and where is the size of the array
  • Helpful to consider 1-2 separately from 3

    • 1-2 hash code
    • 3 compression function
      • code % table_size
      • Selecting the table size to be a prime number is a common trick...
  • Let's look at some hash code functions from Java...

Java Integer hashCode Method


    @Override
    public int hashCode() {
        return Integer.hashCode(value);
    }

    public static int hashCode(int value) {
        return value;
    }

https://github.com/openjdk/jdk17/blob/master/src/java.base/share/classes/java/lang/Integer.java

Java String hashCode Method

Javadoc:

"Returns a hash code for this string. The hash code for a String object is computed as using int arithmetic, where s[i] is the th character of the string, and n is the length of the string. (The hash value of the empty string is zero.)"

    public int hashCode() {
        int h = hash;
        if (h == 0 && !hashIsZero) {
            h = isLatin1() ? StringLatin1.hashCode(value)
                           : StringUTF16.hashCode(value);
            if (h == 0) {
                hashIsZero = true;
            } else {
                hash = h;
            }
        }
        return h;
    }

https://github.com/openjdk/jdk17/blob/master/src/java.base/share/classes/java/lang/String.java

    public static int hashCode(byte[] value) {
        int h = 0;
        int length = value.length >> 1;
        for (int i = 0; i < length; i++) {
            h = 31 * h + getChar(value, i);
        }
        return h;
    }

https://github.com/openjdk/jdk17/blob/master/src/java.base/share/classes/java/lang/StringUTF16.java

Socrative Question

Does the String hashCode method guarantee that no two strings will hash to the same value? If not, is it possible to design a hashCode method that would have this property?

A) Yes
B) No, Yes
C) No, No

Handling Collisions: Open Hashing/Separate Chaining

  • Worst case insertion time?

  • Load factor:

    • - number of elements stored
    • - number of buckets in table
  • Question: what is the average number of comparisons required to find a key in a table with the load factor ?

    • If the key is absent?
      • Approximately
    • If the key is present?
      • Approximately
  • As long as is a constant, all operations are constant time (average case).

  • How can be a constant if we don't know in advance how many elements we will store?

    • Rehashing!

Hashing In Java

  • Every Java class inherits the hashCode method from Java.
  • This means that any java object can be used as a key. (Unless it intentionally overrides and breaks hashCode.)

Socrative Quiz

What is the relationship between the equals method and the hashCode method in Java?

A) Any time the equals method is overriden, hashCode should be as well. Hashing will break if two non-equal objects can have the same hash code.

B) Any time the equals method is overriden, hashCode should be as well. Hashing will break if two equal objects can have different hash codes.

C) Any time the equals method is overriden, hashCode should be as well. Hashing will be slow if two non-equal objects can have the same hash code.

D) Any time the equals method is overriden, hashCode should be as well. Hashing will be slow if two equal objects can have different hash codes.

D) There is no real relationship between equals and hashCode. It is fine to override one without considering the other.