Hashing Lab
Objectives
The goal of this lab is to gain experience with the implementation of basic hashing structures.
Introduction
Download the lab starter files here:
Don't forget to download and customize your machine-specific makefile.
In this lab, you will implement a very small subset of the PA2/PA3 Set functionality using a hash table as the backing store instead of a dynamic array or skip list.
IMPORTANT: Until now, we have talked about hashing in the context of the Map ADT, but recall that a Set is basically equivalent to a Map in which the values don't matter. Thus, for the purposes of this assignment, "values" in a Set are equivalent to the "keys" in a Map from our previous hashing discussions.
Your hash table should use linear probing to resolve resolutions, and it should auto-expand whenever the load factor exceeds 0.5.
Hash Sets and Hash Tables
In hash_set.h
, you will find new data types for hash-based sets.
typedef struct bucket {
bool is_empty;
data_t value;
struct bucket *next; // for chaining only
} bucket_t;
typedef struct hash_set {
bucket_t *table;
size_t capacity;
size_t size;
} hash_set_t;
For the purposes of this lab, a "hash table" is just an array of bucket_t
structs, which may or may not contain actual data values. If the is_empty
field is false
for a particular bucket, then that bucket stores a data element in the value
field.
Unless you attempt the "challenge" portion of this lab, you shouldn't need to use the next
pointers in the bucket_t
struct.
We have also provided two helper functions for you to use:
size_t hash(data_t value)
Calculate a hash code for the given value.
bucket_t* malloc_buckets(size_t capacity)
Allocate space for a new hash table (array of buckets) on the heap.
Required Functions
For this lab, you will only need to implement two functions of the Set interface. Both functions should run in O(1) time.
hash_set_add(hash_set_t *set, data_t value)
Adds an element to a set. Don't forget to rehash if the maximum load factor is exceeded.
hash_set_contains(hash_set_t *set, data_t value)
Returns true if the element is a member of the set; false otherwise.
Recommended Functions
You will probably find the required functions much easier to write if you first implement a couple of helper functions (prototypes provided) that work on hash tables (bucket_t
arrays) instead of the full set struct. Obviously, these helper functions should also run in O(1) time.
bucket_t* probe(bucket_t table[], size_t capacity, data_t value)
Probe for the given value in a hash table. This function should return the bucket where the value belongs. Either the value will already be there, or the bucket will be empty.
void insert(bucket_t table[], size_t capacity, data_t value)
Insert a new value into a hash table. This function itself will be easier to implement if you write
probe
first and then use it.
Evaluation
We have provided a basic testsuite that inserts a few numbers and performs membership testing. Your solution should pass this test. Sample output:
Running suite(s): Default
[ _, _, _, _ ]
[ _, _, 1, _ ]
[ 3, _, 1, _ ]
[ 3, _, 5, _, _, _, 1, _ ]
[ 3, _, 5, _, 7, _, 1, _ ]
[ 3, _, _, _, 7, _, 1, _, _, _, 5, _, _, _, 9, _ ]
contains 1? yes
contains 2? no
contains 4? no
contains 5? yes
contains 8? no
contains 9? yes
100%: Checks: 1, Failures: 0, Errors: 0
SUCCESS: All current tests passed!
The underscores in the hash table printouts represent empty buckets.
For performance evaluation, we have included a modified version of the comparison tests from PA3 (main.c
). If you put a working implementation of PA2 (array_set.c
) and PA3 (skip_set.c
) in the same folder, you should be able to build and run main
.
NOTE: You may receive compiler warnings about the redeclaration of the data_t
typedef. These are safe to ignore as long as all of the declarations match.
If your hash set is working correctly, it should be an order of magnitude faster than even the skip set implementation from PA3. Here is some example output:
ARRAY ADD 10000 0.009480
SKIP ADD 10000 0.002700
HASH ADD 10000 0.000634
ARRAY CONTAINS 10000 0.015811
SKIP CONTAINS 10000 0.001897
HASH CONTAINS 10000 0.000243
Challenge: Chaining (Optional)
After you have the probing implementation working, save a copy of the file and then change your implementation to use chaining (open addressing) rather than linear probing.
For this part, you may wish to raise the maximum load factor to 0.9 or 1.0.