Closed Hashing

Linear Probing

  • Problem: primary clustering - collisions tend to cause clusters of occupied buckets.
  • The larger the cluster gets, the higher the probabilility that it will grow.

Impact of Load Factor on Cost

Dashed lines are linear probing, solid lines are "random" probing.

Load factor is on the x-axis, expected number of buckets accessed on the y-axis.

(From OpenDSA Data Structures and Algorithms book)

Solution #1: Quadratic Probing

  • Probe function:
  • Bucket to access: , where:
    • - original key
    • - result of hashing the key
    • - offset from the original hash location to check after collisions
  • For example, assuming :
    • After one collision:
    • After two collisions:
    • After three collisions:
    • After four collisions:
  • Problem:
    • Secondary Clustering: Keys that hash to the same bucket will follow the same probe sequence.

Solution #2: Double Hashing

  • Probe function:
  • Example with
    • After one collision for :
    • After two collisions for :
    • After three collisions for :
  • Different keys follow different probe sequences, so this is resistant to both primary and secondary clustering.