CS 240: Data Structures and Algorithms
James Madison University, Spring 2013

Programming Assignment #5: Trie

Introduction

Your final assignment is to implement a trie in Python. This is a data structure that we have not discussed in class, so this assignment is meant to simulate the sorts of problems you might encounter as a developer: implementing a new data structure for a particular task.

The program context is a spell checker. The need is to store a spelling dictionary so that it is as fast as possible to see whether a word is in the dictionary. The data structure we will use to store the dictionary is called a trie, which is derived from the middle letters of the word retrieval. Tries were invented many years ago to speed information retrieval tasks.

A trie is a tree whose edges are labeled with letters. Paths from the root of the trie spell out words. The image below shows a trie.

The black nodes indicate that the path to this point in the trie spells a word. The words represented by this trie are a, an, and, ant, cab and cat. It is very fast to check whether a word is in a trie: one need only start from the root and trace out the word along the edges; if one ends up at a word-terminating node (a black node), then the word is in the trie, otherwise it is not. A trie is very good for spell checking: simply construct a trie representing all the words in a dictionary, and then check each word in a document to see if it is in the trie. The words not in the trie are (presumably) misspelled.

Tries can also efficiently support auto-completion. Given the prefix-based structure of the trie, it is straightforward to enumerate all of the words that begin with a given letter sequence: these will be represented by nodes that are descendants of the node corresponding to that prefix.

There are many ways to implement tries. We will use a simple implementation that uses a lot of space, but is fast. Each node of the trie has a Boolean field indicating whether a string ending there is a word (corresponding to whether the node is black or white in the diagram). Each node also contains a Python dictionary representing outgoing edges. The dictionary keys are individual letters, and the values are references to other trie nodes.

Requirements

You must create a Trie class that conforms to the following docstrings.
class Trie(__builtin__.object)
 |  Trie class that supports spell checking and auto-completion.
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, word)
 |      Return True if the word is in the Trie, False otherwise.
 |  
 |  __init__(self)
 |      Create a new empty Trie.
 |  
 |  __iter__(self)
 |      Return an iterator that will allow iteration over all words in
 |      the trie.  The iteration order is arbitrary.
 |  
 |  __len__(self)
 |      Return the number of words stored in this Trie
 |  
 |  insert(self, word)
 |      Insert the indicated word into the Trie. 
 |      This method has no effect if the word is already in the Trie.
 |      No Return value.
 |  
 |  containsPrefix(self, prefix)
 |      Return true if the indicated string is a word in the Trie or is a 
 |      prefix of any word in the Trie. Return False otherwise.
 |  
 |  prefixIter(self, prefix)
 |      Return an iterator that will allow iteration over all of the
 |      words in the trie that have the indicated prefix.  The
 |      iteration order is arbitrary.
 |  

You must create a node class to represent Trie nodes. Your code must be submitted in a file named trie.py.

For full credit, you must complete all of the methods listed above. You may also choose to complete only the methods in green for a maximum score of 90/100.

I will provide a spell-checking program on Piazza that uses code conforming to the specification above. I will use this program (as well as a set of unit tests) for testing your code. I suggest that you do the same.

Acknowledgments

This project is based on a Go programming assignment designed by Chris Fox. Much of the wording above is borrowed from that project. Any typos or logic errors were probably introduced by me.

Handing In

You should submit a single .py file through Blackboard before the deadline. As always, your code should conform to the CS240 Python Coding Conventions.