CS 240: Data Structures and Algorithms
James Madison University, Fall 2012

Programming Assignment #5: HashMap

Introduction

The goal for this assignment is to complete a partially implemented Python HashMap class, and to run some experiments to select an appropriate maximum load value. Your class will use the division method for hashing along with prime-number-sized hash tables. Collision resolution will be handled using separate chaining.

Map ADT

Here is the PyDoc documentation for the HashMap class that you will implement. The file hashmap.py contains a partial implementation. Methods in green are already implemented. It will be your job to complete the methods in red.


class HashMap(__builtin__.object)
 |  The HashMap associates stored values with unique keys.  This
 |  class provides a subset of the functionality of Python's built in
 |  dictionary class.  Key objects must implement the __hash__ function. 
 |  
 |  Hashing is based on the division method with prime-number-sized tables.
 |  Collisions are handled using separate chaining.
 |  
 |  MAX_LOAD is a public class attribute that determines the maximum load
 |  factor for the hash table.  The Default value is 1.0
 |  
 |  Methods defined here:
 |  
 |   __contains__(self, key)
 |      H.__contains__(k) -> True if H has a key k, else False
 |  
 |  __delitem__(self, key)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __eq__(self, rhs)
 |      x.__eq__(y) <==> x==y
 |  
 |  __getitem__(self, key)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __init__(self)
 |      HashMap() -> new empty HashMap
 |  
 |  __iter__(self)
 |      x.__iter__() <==> iter(x)
 |  
 |  __len__(self)
 |      x.__len__() <==> len(x)
 |  
 |  __ne__(self, rhs)
 |      x.__ne__(y) <==> x!=y
 |  
 |  __repr__(self)
 |      x.__repr__() <==> repr(x)
 |  
 |  __setitem__(self, key, value)
 |      x.__setitem__(i, y) <==> x[i]=y
 |  
 |  clear(self)
 |      H.clear() -> None.  Remove all items from H.
 |  
 |  copy(self)
 |      H.copy() -> a shallow copy of H
 |  
 |  get(self, k, d=None)
 |      H.get(k[,d]) -> H[k] if k in H, else d.  d defaults to None.
 |  
 |  items(self)
 |      H.items() -> list of H's (key, value) pairs, as 2-tuples
 |  
 |  iteritems(self)
 |      H.iteritems() -> an iterator over the (key, value) items of H
 |  
 |  itervalues(self)
 |      H.itervalues() -> an iterator over the values of H
 |  
 |  keys(self)
 |      H.keys() -> list of H's keys
 |  
 |  pop(self, key)
 |      H.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(self)
 |      H.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if H is empty.
 |  
 |  values(self)
 |      H.values() -> list of H's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  MAX_LOAD = 1.0

Notice that the documentation above is minimal. For the sake of consistency, these are almost exactly the same as the pydoc comments for Python's built-in dictionary class. With some minor exceptions (noted below) the behavior of the HashMap methods above should be exactly the same as the corresponding Python dictionary methods. If you need clarification for any of these methods I suggest that you refer to the official Python dictionary documentation. If anything is still unclear, I suggest opening up a Python interpreter and experimenting with the Python dictionary methods directly.

There will be some minor differences in the functionality of the HashMap class in comparison to Python dictionaries:

Implementation Notes

Typically, hash tables use arrays as the underlying storage mechanism. Since Python has no array type, we need an alternative. We could use the turtle array class from PA#2, but the resulting implementation would be slow: there is a lot of overhead involved in the memory management and drawing. Furthermore, the visualization functionality for that Array class wouldn't be very helpful here because the actual keys and values are not stored directly in the Array.

Instead of arrays, this implementation will use Python lists for storage. This means that you need to be careful not to use list methods outside of those that would be provided by an array. For example, if you find yourself using the list iterator, or the contains method, you are probably doing something wrong. The danger here is that many list methods require O(n) time, while nearly all HashMap operations should be O(1).

Testing

You do not need to develop unit tests for this assignment. As always, careful testing is likely to make your life easier in the long run.

Selecting a Load Factor

The following script can be used to benchmark the performance of your HashMap class for increasing values of MAX_LOAD: max_load_experiment.py . Prepare a short (1 page) document that includes a graph of your timing results along with answers to the following questions:

Handing In

You should submit two files through blackboard before the deadline: your completed hashmap.py file, and a .pdf file containing your experimental results. As always, your code should conform to the CS240 Python Coding Conventions.