Your final assignment is to implement a trie in Python. This is a data structure that we have not discussed in class, so this assignment is meant to simulate the sorts of problems you might encounter as a developer: implementing a new data structure for a particular task.
The program context is a spell checker. The need is to store a spelling dictionary so that it is as fast as possible to see whether a word is in the dictionary. The data structure we will use to store the dictionary is called a trie, which is derived from the middle letters of the word retrieval. Tries were invented many years ago to speed information retrieval tasks.
A trie is a tree whose edges are labeled with letters. Paths from the root of the trie spell out words. The image below shows a trie.
The black nodes indicate that the path to this point in the trie spells a word. The words represented by this trie are a, an, and, ant, cab and cat. It is very fast to check whether a word is in a trie: one need only start from the root and trace out the word along the edges; if one ends up at a word-terminating node (a black node), then the word is in the trie, otherwise it is not. A trie is very good for spell checking: simply construct a trie representing all the words in a dictionary, and then check each word in a document to see if it is in the trie. The words not in the trie are (presumably) misspelled.
Tries can also efficiently support auto-completion. Given the prefix-based structure of the trie, it is straightforward to enumerate all of the words that begin with a given letter sequence: these will be represented by nodes that are descendants of the node corresponding to that prefix.
There are many ways to implement tries. We will use a simple implementation that uses a lot of space, but is fast. Each node of the trie has a Boolean field indicating whether a string ending there is a word (corresponding to whether the node is black or white in the diagram). Each node also contains a Python dictionary representing outgoing edges. The dictionary keys are individual letters, and the values are references to other trie nodes.
class Trie(__builtin__.object) | Trie class that supports spell checking and auto-completion. | | Methods defined here: | | __contains__(self, word) | Return True if the word is in the Trie, False otherwise. | | __init__(self) | Create a new empty Trie. | | __iter__(self) | Return an iterator that will allow iteration over all words in | the trie. The iteration order is arbitrary. | | __len__(self) | Return the number of words stored in this Trie | | insert(self, word) | Insert the indicated word into the Trie. | This method has no effect if the word is already in the Trie. | No Return value. | | containsPrefix(self, prefix) | Return true if the indicated string is a word in the Trie or is a | prefix of any word in the Trie. Return False otherwise. | | prefixIter(self, prefix) | Return an iterator that will allow iteration over all of the | words in the trie that have the indicated prefix. The | iteration order is arbitrary. |
You must create a node class to represent Trie nodes. Your code
must be submitted in a file named trie.py
.
For full credit, you must complete all of the methods listed above. You may also choose to complete only the methods in green for a maximum score of 90/100.
I will provide a spell-checking program on Piazza that uses code conforming to the specification above. I will use this program (as well as a set of unit tests) for testing your code. I suggest that you do the same.