Socrative Quiz

When a class overrides equals it should also override hashCode. Consider the following hashCode implementation:

  @Override
  public int hashCode() {
    return 1;
  }

Assuming the HashTable implementation from last week’s lab, how will this hashCode function perform in practice?

  1. There method is fine. Hashing will work correctly and performance will be OK.
  2. This method will break hashing. Lookups will not work correctly.
  3. Lookups will work correctly with this method, but lookup performance will be \(\Theta(n)\).
  4. Lookups will work correctly with this method, but lookup performance will be \(\Theta(n^2)\).

Impact of Load Factor on Cost

Dashed lines are linear probing, solid lines are “random” probing.

Dashed lines are linear probing, solid lines are “random” probing.

Load factor is on the x-axis, expected number of buckets accessed on the y-axis.

(From OpenDSA book)

Socrative

Which of the following best describes the corresponding figure when chaining is used for collision resolution? (load factor on the x-axis, expected number of key comparisons on the y-axis).

  1. The plot would have more or less the same shape, with somewhat fewer operations than linear probing.
  2. The plot would approach infinity at .5 instead of 1.
  3. The plot would be a straight line, with a positive slope.
  4. The plot would be a horizontal line at 1.0.

Problems/Challenges with Hashing

What Gets used In The Wild?

C#/.net Dictionary

source

blog post

Python 2

Implementation in: Python-2.7.2/Objects/dictobject.c. Source can be downloaded from www.python.org.

Python 3.6+

Update in Python 3:

the dictionary:

    d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

    entries = [['--', '--', '--'],
               [-8522787127447073495, 'barry', 'green'],
               ['--', '--', '--'],
               ['--', '--', '--'],
               ['--', '--', '--'],
               [-9092791511155847987, 'timmy', 'red'],
               ['--', '--', '--'],
               [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

    indices =  [None, 1, None, None, None, 0, None, 2]
    entries =  [[-9092791511155847987, 'timmy', 'red'],
                [-8522787127447073495, 'barry', 'green'],
                [-6480567542315338377, 'guido', 'blue']]

Only the data layout needs to change.  The hash table
algorithms would stay the same.

descripion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html

source: https://github.com/python/cpython/blob/main/Objects/dictobject.c

Ruby

Original source: https://github.com/ruby/ruby/blob/ruby_2_3/st.c Current source: https://github.com/ruby/ruby/blob/ruby_3_0/st.c

Java 7 (We’ve seen…)

static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }


    /**
     * Returns index for hash code h.
     */

    static int indexFor(int h, int length) {
        return h & (length-1);
    }

source (GPL V2.0)

Java 8+

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Java 16 Source: https://github.com/openjdk/jdk/blob/jdk-16+36/src/java.base/share/classes/java/util/HashMap.java