CS 240: Data Structures and Algorithms
James Madison University, Spring 2013

Programming Assignment #2: Array Set

Introduction

The objective of this assignment is to implement a contiguous implementation of a Set ADT that includes most of the functionality of Python's built-in set type. The functionality of our Set class will be almost the same as the functionality of Python's set class, with some exceptions described below.

It is important that you understand the material in Section 2.2 of our textbook as you work on this project. That section shows how to implement a variables sized collection type using fixed-sized arrays. The specific example described in the book is the Python list class, but many of the issues will be the same for a set.

Your set class will store elements in an unordered array. This is not a very efficient approach, because determining set membership requires a sequential search through the array. As a result, some set operations may take quadratic time. For example: set1.isdisjoint(set2) will require O(n*m) steps in the worst case: we need to search through all m elements in set2 for each of the n items in set1. We will consider more efficient implementations of the Set ADT later in the semester.

The Array Class and Memory Management

You must use the provided Array class defined in the module t_array.py as the underlying data structure for your Set class. This is similar to the Array class described in our textbook with a few key differences. First, this Array class has visualization code that will allow us to inspect the contents of all of the arrays that are in existence at any one time. Second, objects of this array class need to be explicitly released when they are no longer needed.

In Python (and Java) there is a garbage collection process that automatically frees up memory that is no longer being referenced. In some programming languages, like C or C++, it is the responsibility of the programmer to explicitly release any memory that is no longer being used. The Array class defined in t_array.py brings with it that same responsibility: the garbage collector will not free Array objects automatically.

Part of your grade for this project will be based on writing code that has no "memory leaks". A memory leak occurs when some memory is no longer referenced, but has not been explicitly released.

Here is the pydoc documentation for the Array class:


class Array(__builtin__.object)
 |  Array class that supports turtle graphics visualization.  
 |  
 |  There are two public, class-level variables that can be used
 |  to control the visualization: 
 |  
 |  Array.visible  - Set this to False to turn off graphics. 
 |  Array.speed -    Set this value to change the speed of the animation. 
 |                   This value indicates the length of the pauses (in seconds)
 |                   between updates.
 |  
 |  Methods defined here:
 |  
 |  __getitem__(self, index)
 |      x.__getitem__(y) <==> x[y] 
 |      Returns the contents of the indicated array entry.
 |  
 |  __init__(self, size)
 |      Creates an array with a given capacity.
 |  
 |  __iter__(self)
 |      Arrays do not support iteration.
 |  
 |  __len__(self)
 |      x.__len__() <==> len(x)
 |      Returns the size of the array.
 |  
 |  __setitem__(self, index, value)
 |      x.__setitem__(i, y) <==> x[i]=y
 |      Stores the value at the indicated index.
 |  
 |  free(self)
 |      Frees this array.  
 |      
 |      The array will no longer appear in the visualization, and may
 |      be garbage collected as long as no external references exist.
 |  
 |  ----------------------------------------------------------------------

Testing

Part of your grade for this assignment will be based on developing unit tests for your code. You may use the file set_tests.py as a starting point. That file includes tests for all Set methods except discard, remove and pop. 15% of your grade for this assignment will be based on writing adequate tests for those three methods.

The Set ADT

The description below is copied almost verbatim from the official Python set documentation. I have removed some details that are specific to the Python implementation of the set class. In addition, a few methods and operators have been left out for the sake of simplicity: difference_update, -= etc. You are welcome to complete the remaining methods if you wish, but they will not affect your grade.

A Set object is an unordered collection of distinct objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.

class Set([iterable])

Return a new Set object whose elements are taken from iterable. If iterable is not specified, a new empty set is returned.

Instances of Set provide the following operations:

add(elem)

Add element elem to the set.

remove(elem)

Remove element elem from the set. Raises KeyError if elem is not contained in the set.

discard(elem)

Remove element elem from the set if it is present.

pop()

Remove and return an arbitrary element from the set. Raises KeyError if the set is empty.

clear()

Remove all elements from the set.

len(s)

Return the cardinality of set s.

x in s

Test x for membership in s.

x not in s

Test x for non-membership in s.

isdisjoint(other)

Return True if the set has no elements in common with other. Sets are disjoint if and only if their intersection is the empty set.

issubset(other)
set <= other

Test whether every element in the set is in other.

set < other

Test whether the set is a proper subset of other, that is, set <= other and set != other.

issuperset(other)
set >= other

Test whether every element in other is in the set.

set > other

Test whether the set is a proper superset of other, that is, set >= other and set != other.

union(other)
set | other | ...

Return a new set with elements from the set and all others.

intersection(other)
set & other & ...

Return a new set with elements common to the set and all others.

difference(other)
set - other - ...

Return a new set with elements in the set that are not in the others.

symmetric_difference(other)
set ^ other

Return a new set with elements in either the set or other but not both.

copy()

Return a new set with a shallow copy of s.

Set supports set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal).

The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b. Accordingly, sets do not implement the __cmp__ method.

The Python documentation fails to mention a few additional methods that you will need to complete:
__len__()

Return the cardinality of the set.

__str__()
__repr__()

Return a string representation of the set. The format should be as follows:

     'set([ELEMENT_1, ELEMENT_2,..., ELEMENT_N])'
    
Where the string representations of each element should be obtained using the repr method.

__iter__()

Return an iterator for the set.

Note that both the operator and non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset() and issuperset() require their arguments to be Set objects. A TypeError should be raised if any of those methods receive an argument of another type.

Advice on Memory Management

Your Set class will be responsible for freeing Arrays that are no longer referenced. This means that there must be some way for Set objects to know that they are no longer referenced. When a Set object is released by the garbage collector it will need to explicitly release any Arrays that it is still holding onto. This can be accomplished by overriding the built in Python operator __del__. That method will be called automatically when there are no more references to the set.

If you've done everything correctly, the following python script should terminate with no arrays remaining on the screen.

from array_set import Set

items = Set()
for i in range(8):
    items.add(i)

# Overwrite the reference to our set so that it can be garbage
# collected. The __del__ method of items will be called. 
items = "Goodbye!"

Note that, under normal circumstances, YOU WOULD NEVER NEED TO OVERRIDE __del__. You are only doing this because this project exists in a Bizarro world where Python garbage collection does not exist for some classes.

Handing In

You should submit two .py files through blackboard before the deadline: a module named array_set.py containing your completed Set class, and an updated version of the unit testing module. As always, your code should conform to the CS240 Python Coding Conventions. You do not need to provide docstrings for the methods in the unit testing class.

Acknowledgments

The description of the Set class is largely taken from the official Python documentation. © 2001-2010 Python Software Foundation; All Rights Reserved. Here is a copy of the PSF License: psf_license.txt. The documentation has been modified for clarity and consistency with this project.