Array Set PA

Introduction

The objective of this assignment is to implement a contiguous implementation of a set data structure. In mathematics, a set is an unordered collection of distinct objects. Section 1.2.3 in our textbook also describes the USet interface, which is very similar.

Common uses of sets in computing include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

IMPORTANT: This is a large programming assignment! Your solution will most likely span several hundred lines of code (the reference solution is ~300 lines). Your final grade for the assignment will be based on the final deliverable. However, you should not wait until the final week to work on this PA! To encourage early engagement, we are providing the opportunity for you to submit your work at two intermediate milestones to receive feedback on your progress. We highly recommend that you take advantage of this to avoid falling irreparably behind.

Getting Started

First download and decompress one of the starter files:

Don't forget to download the appropriate Makefile for your platform from Piazza or the course website.

The Set ADT

For the purposes of this assignment, the set interface (an abstract data type) should support the following operations:

  • Adding and removing elements
  • Testing for set membership
  • Iteration over set members
  • Subset, superset, equality, and disjointedness testing
  • Mathematical union, intersection, difference, and symmetric difference operations

In addition, you will need to implement standard data structure features:

  • Allocation, clearing, and deallocation
  • Size tracking
  • Shallow copying

For this assignment, your implementation of this set interface should store elements in an unordered dynamic array.

This is a very simple way to store a set. However, it is not a very efficient approach because determining set membership requires a sequential search through the array. As a result, some set operations may take quadratic time. For example: arrayset_is_disjoint(set1, set2) will require _O(n*m)_ steps in the worst case: we need to search through all m elements in set2 for each of the n items in set1.

We will consider more efficient implementations of the Set ADT later in the semester.

Definitions

Because C does not support higher-level polymorphism, we will need to restrict set elements to a single type. However, we would like to parameterize it as much as possible to facilitate changes later on. Thus, we will use a single typedef to create a data_t type, and use that wherever we wish to refer to the type of set elements. For testing purposes, we will only use sets to store integers.

This definition is in the provided array_set.h:

typedef int data_t;

We also need a data type definition for the set structure itself. Recall from the class discussion that we need three pieces of information to maintain a dynamic array:

  1. A pointer to the actual storage array.
  2. The current maximum allocated capacity of the array.
  3. The current size (or length) of the array.

We will store this information in a single array_set struct and create an array_set_t type for it. These definitions are also in the provided array_set.h:

typedef struct array_set {
    data_t *array;
    size_t capacity;
    size_t size;
} array_set_t;

Finally, we will need an iterator type for our set. Iterators will allow us to traverse the set in a way that is independent of the underlying implementation. For the purposes of this assignment, an iterator will just be a pointer to a data element; i.e., a data_h* value. As before, the definition is in array_set.h:

typedef data_t* array_set_iter_t;

You should not modify these definitions, or any other definitions in array_set.h. Your code (in array_set.c) should compile against the provided array_set.h without errors or warnings.

Part 1: Basic Set Operations

MILESTONE DATE: Friday, October 2, 2015 at 23:59 EDT (11:59pm)

The following operations should all be O(1); i.e., their running time should not depend on the number of elements in the set.

  • void array_set_init(array_set_t *set)

    Initializes a new empty set.

  • void array_set_clear(array_set_t *set)

    Removes all elements from a set (doesn't deallocate anything!).

  • size_t array_set_size(array_set_t *set)

    Returns the size of a set.

  • void array_set_free(array_set_t *set)

    Deallocate memory associated with a set.

HINT: In the array_set_free function, don't forget to clean up the other members of the array_set_t struct after you free the actual array.

The following operations deal with adding or removing element from the set. The pop operation should run in O(1) time, and the other operations (addition, removal and membership testing) should run in O(n) time.

  • void array_set_add(array_set_t *set, data_t value)

    Adds an element to a set. If the set already contains the given value, no changes take place.

  • void array_set_remove(array_set_t *set, data_t value)

    Removes an element from a set. If the set does not contain the given value, no changes take place.

  • data_t array_set_pop(array_set_t *set)

    Returns and removes an arbitrary element from a set. The behavior is undefined if the set contains zero elements.

  • bool array_set_contains(array_set_t *set, data_t value)

    Returns true if the element is a member of the set; false otherwise.

Part 2: Iterators and Comparisons

MILESTONE DATE: Friday, October 9, 2015 at 23:59 EDT (11:59pm)

The following operations handle iteration over a set. They should all be O(1).

  • array_set_iter_t array_set_begin(array_set_t *set)

    Returns an iterator for a set that represents the beginning of the set.

  • array_set_iter_t array_set_end(array_set_t *set)

    Returns an iterator for a set that represents the end of the set.

  • array_set_iter_t array_set_next(array_set_iter_t iter)

    Advances and returns a set iterator.

The following operations handle comparisons between two sets. Because the underlying storage structure for this assignment (dynamic arrays) only supports O(n) lookups, these methods will require O(n2) time.

  • bool array_set_is_subset(array_set_t *set1, array_set_t *set2)

    Returns true if and only if set1 is a subset of set2.

  • bool array_set_is_superset(array_set_t *set1, array_set_t *set2)

    Returns true if and only if set1 is a superset of set2.

  • bool array_set_is_equal(array_set_t *set1, array_set_t *set2)

    Returns true if and only if set1 and set2 contain the same items.

  • bool array_set_is_disjoint(array_set_t *set1, array_set_t *set2)

    Returns true if and only if set1 and set2 don't contain any of the same items.

Part 3: Mathematical Set Operations

The last few operations implement standard mathematical set operations. One thing that they all have in common is that they should populate a destination set with elements drawn from other sets. Thus, they all take a "dest" pointer to a set. Each function should ensure that the destination set is empty by clearing it before performing whatever operations are appropriate for that function.

The first function creates a shallow copy of a set, and should run in O(n) time.

  • void array_set_copy(array_set_t *dest, array_set_t *src)

    Make dest a copy of src; i.e., dest should contain all elements in src and no extra elements.

The following operations build new sets using two existing sets. Because the underlying storage structure for this assignment (dynamic arrays) only supports O(n) lookups, these methods will require O(n2) time.

  • void array_set_union(array_set_t *dest, array_set_t *set1, array_set_t *set2)

    Calculates the union of set1 and set2, storing it in dest.

  • void array_set_intersect(array_set_t *dest, array_set_t *set1, array_set_t *set2)

    Calculates the intersection of set1 and set2, storing it in dest.

  • void array_set_diff(array_set_t *dest, array_set_t *set1, array_set_t *set2)

    Calculates the difference between set1 and set2, storing it in dest.

  • void array_set_sym_diff(array_set_t *dest, array_set_t *set1, array_set_t *set2)

    Calculates the symmetric difference between set1 and set2, storing it in dest.

Final Submission

DUE DATE: Friday, October 16, 2015 at 23:59 EDT (11:59pm)

Submit ONLY your completed array_set.c file on Canvas using the appropriate assignment submission. Make sure the file contains a comment field at the top with your name and a statement of compliance with the JMU honor code.

Please check your code carefully and make sure that it complies with the project specification. In addition, make sure that your code adheres to general good coding style guidelines. Fix any spelling mistakes, incorrect code formatting, and inconsistent whitespace before submitting. Make sure you include any appropriate documentation.