PA2: Array Set


Introduction

The objective of this assignment is to implement a contiguous implementation of a Set ADT that includes most of the functionality of Python's built-in set type. The functionality of our Set class will be almost the same as the functionality of Python's set class, with some exceptions described below.

It is important that you understand the material in Section 5.3 of our textbook as you work on this project. That section shows how to implement a variable-sized collection type using fixed-sized arrays. The specific example described in the book is the Python list class, but many of the issues will be the same for a set.

Your set class will store elements in an unordered dynamic array. This is not a very efficient approach, because determining set membership requires a sequential search through the array. As a result, some set operations may take quadratic time. For example: set1.isdisjoint(set2) will require O(n*m) steps in the worst case: we need to search through all m elements in set2 for each of the n items in set1. We will consider more efficient implementations of the Set ADT later in the semester.

Background: Array Class

You must use the provided Array class defined in the module t_array.py as the underlying data structure for your Set class. This is similar to the Array class described in our textbook. However, this Array class has visualization code that will allow us to inspect the contents of all of the arrays that are in existence at any one time. You should use this class as the basis of a dynamic array implementation for your Set internal storage.

Here is the pydoc documentation for the Array class:

    class Array(__builtin__.object)
     |  Array class that supports turtle graphics visualization.  
     |  
     |  There are two public, class-level variables that can be used
     |  to control the visualization: 
     |  
     |  Array.visible  - Set this to False to turn off graphics. 
     |  Array.speed -    Set this value to change the speed of the animation. 
     |                   This value indicates the length of the pauses (in seconds)
     |                   between updates.
     |  
     |  Methods defined here:
     |  
     |  __getitem__(self, index)
     |      x.__getitem__(y) <==> x[y] 
     |      Returns the contents of the indicated array entry.
     |  
     |  __init__(self, size)
     |      Creates an array with a given capacity.
     |  
     |  __iter__(self)
     |      Arrays do not support iteration.
     |  
     |  __len__(self)
     |      x.__len__() <==> len(x)
     |      Returns the size of the array.
     |  
     |  __setitem__(self, index, value)
     |      x.__setitem__(i, y) <==> x[i]=y
     |      Stores the value at the indicated index.
     |  
     |  free(self)
     |      Frees this array.  
     |      
     |      The array will no longer appear in the visualization, and may
     |      be garbage collected as long as no external references exist.
     |  
     |  ----------------------------------------------------------------------

You will probably want to turn off the graphics for some tests, as they can slow down program execution considerably.

You should not modify t_array.py.

The Set ADT

You should implement a Python class called Set in a file called array_set.py. We are not providing boilerplate code for your Set class; you will need to write the array_set.py file from scratch.

The description below is copied almost verbatim from the official Python set documentation. We have removed some details that are specific to the Python implementation of the set class. In addition, a few methods and operators have been left out for the sake of simplicity: difference_update, -= etc. You are welcome to complete the remaining methods if you wish, but they will not affect your grade.

A Set object is an unordered collection of distinct objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.

class Set([iterable])

Return a new Set object whose elements are taken from optional parameter iterable. If iterable is not specified, a new empty set is returned.

Instances of Set provide the following operations:

add(elem)

Add element elem to the set.

remove(elem)

Remove element elem from the set. Raises KeyError if elem is not contained in the set.

discard(elem)

Remove element elem from the set if it is present.

pop()

Remove and return an arbitrary element from the set. Raises KeyError if the set is empty.

clear()

Remove all elements from the set.

x in s

Test x for membership in s.

isdisjoint(other)

Return True if the set has no elements in common with other. Sets are disjoint if and only if their intersection is the empty set. Raises a TypeError if other is not a Set instance.

issubset(other)
set <= other

Test whether every element in the set is in other. Raises a TypeError if other is not a Set instance.

set < other

Test whether the set is a proper subset of other, that is, set <= other and set != other. Raises a TypeError if other is not a Set instance.

issuperset(other)
set >= other

Test whether every element in other is in the set. Raises a TypeError if other is not a Set instance.

set > other

Test whether the set is a proper superset of other, that is, set >= other and set != other. Raises a TypeError if other is not a Set instance.

union(other)
set | other | ...

Return a new set with elements from the set and all others. Raises a TypeError if other is not a Set instance.

intersection(other)
set & other & ...

Return a new set with elements common to the set and all others. Raises a TypeError if other is not a Set instance.

difference(other)
set - other - ...

Return a new set with elements in the set that are not in the others. Raises a TypeError if other is not a Set instance.

symmetric_difference(other)
set ^ other

Return a new set with elements in either the set or other but not both. Raises a TypeError if other is not a Set instance.

copy()

Return a new set with a shallow copy of s.

Set supports set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal).

The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b. Accordingly, sets do not implement the __cmp__ method.

The Python documentation fails to mention a few additional methods that you will need to complete:
__len__()

Return the cardinality of the set.

__eq__(other)

Returns true if the set is equal to another set. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). If other is not a Set instance, this should return False.

__str__()
__repr__()

Return a string representation of the set. The format should be as follows:
'set([ELEMENT_1, ELEMENT_2,..., ELEMENT_N])'
Where the string representations of each element should be obtained using the repr() method.

__iter__()

Return an iterator for the set.

Note that both the operator and non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset() and issuperset() require their arguments to be Set objects. A TypeError should be raised if any of those methods receive an argument of another type.

Hint: There are many opportunities for code reuse in this assignment; take advantage of them!

Testing

Part of your grade for this assignment will be based on developing unit tests for your code. You may use the file set_tests.py as a starting point. That file includes tests for all Set methods except discard, remove and pop. 15% of your grade for this assignment will be based on writing adequate tests for those three methods.

As is the case for all programming assignments in this class, part of your project grade may be based on tests that are not posted publicly. The public test cases that are run when you submit your code are provided as a courtesy to help you verify that your submission is free of syntax errors; they should not be regarded as comprehensive. You should thoroughly test your own code and code stricly to the project specification. If you have any questions about whether a particular behavior is admissible by the spec, post a question on Piazza.

Memory Management (Optional)

The Python runtime provides automatic garbage collection, cleaning up unused data structures when they are no longer referenced. However, the visualization portion of the Array class in t_array.py keeps its own list of all Array objects in existence, thus preventing Python from cleaning up as it goes. This can lead to the visualization becoming cluttered after a while with unused Array objects.

If you would like to clean up the visualization for easier debugging, you will need to help the Array class to manage its memory by doing two things:

  • Clean up the old arrays when you resize your internal storage for the Set class. When you are done with the old Array, you can tell the Array class you are done with it by calling the free() method on the object you no longer want to track in the visualization.
  • Clean up the internal array when a Set object itself is cleaned up by the Python runtime. In order to do this, create a __del__() method in the Set class. Inside that method, call free() on your internal array.

Submission

DEADLINE: Wednesday, October 8, 2014 at 23:59 (11:59PM) ET

You should submit a zip file containing two .py files through the Web-CAT online submission system before the deadline: a module named array_set.py containing your completed Set class, and an updated version of the unit testing module (set_tests.py). As always, your code should conform to the CS240 Python Coding Conventions. Make sure that the array_set.py file contains your name and the JMU Honor Code statement. You do not need to provide docstrings for the methods in the unit testing class, and you do not need to submit t_array.py.

Refer to the course syllabus for the policy on late submissions. These penalties will be applied manually while grading. Web-CAT does not apply them correctly; therefore, that behavior has been disabled.

Acknowledgments

The description of the Set class is largely taken from the official Python documentation. © 2001-2010 Python Software Foundation; All Rights Reserved. Here is a copy of the PSF License: psf_license.txt. The documentation has been modified for clarity and consistency with this project.