Skip Sorted Set PA

Introduction

The objective of this assignment is to implement a contiguous implementation of a sorted set data structure. Section 1.2.4 in our textbook describes the SSet interface, which is very similar. Even though this set structure will store its members in sorted order, many of the operations will be asymptotically faster than their PA2 equivalents due to the use of the clever skip list structure.

Getting Started

First download and decompress one of the starter files:

Don't forget to download the appropriate Makefile for your platform from Piazza or the course website.

The Sorted Set ADT

For the purposes of this assignment, the sorted set interface (an abstract data type) should support the following operations:

  • Adding and removing elements
  • Testing for set membership
  • Iteration over set members (in sorted order)
  • Subset, superset, equality, and disjointedness testing
  • Mathematical union, intersection, difference, and symmetric difference operations

In addition, you will need to implement standard data structure features:

  • Allocation, clearing, and deallocation
  • Size tracking
  • Copying

For this assignment, your implementation of this set interface should store elements in a standard skip list as described in Chapter 4 of the textbook. Here is an example skip list containing seven integers:

Skip List

Skip List

Please refer to the textbook or lecture slides for details on the theory and implementation of skip lists.

Definitions

As in PA2, we provide a data_t typedef in skip_set.h:

typedef int data_t;

We also need data type definitions for the skip set structure as well as for skip list nodes. These definitions are also in the provided skip_set.h:

typedef struct skip_node {
    data_t data;
    size_t height;
    struct skip_node **next;
} skip_node_t;

typedef struct skip_set {
    skip_node_t *head;
    size_t max_height;
    size_t size;
} skip_set_t;

Finally, we will need an iterator type for our set. Iterators will allow us to traverse the set in a way that is independent of the underlying implementation. For the purposes of this assignment, an iterator will be a pointer to a skip list node; i.e., a skip_node_t* value. As before, the definition is in skip_set.h:

typedef skip_node_t* skip_set_iter_t;

IMPORTANT: This iterator definition is different that the one in PA2, where an interator was defined as a pointer to a data_t value. Now it will be a pointer to a skip_node_t struct. This will allow you to iterate over the skip list properly. If you wrote your Part 2 and Part 3 functions to use iterators, you should only need to make minimal modifications after you implement the iterator functions: change any iterator dereferences from

*it

(where it is an interator) to

it->data

or

(*it).data

Of course, you will also need to change function calls as follows:

array_set_begin()  =>  skip_set_begin()
array_set_end()    =>  skip_set_end()
array_set_next()   =>  skip_set_next()

WARNING: You should not modify skip_set.h! Your code (in skip_set.c) must compile against the original skip_set.h without errors or warnings.

Pseudocode

The textbook's pseudocode uses a stack to help re-arrange the skip list on insertion. This works but is a bit overly complicated. Here is some simpler pseudocode for inserting in a skip list:

add(x):
    if the set contains x, return
    choose new random height new_height
    if max_height < new_height:
        expand the sentinel
    w = allocate a new node of height new_height
    u = sentinel
    r = max_height - 1
    while r >= 0 do
        while u.next[r] != NULL and u.next[r].data < x do
            u = u.next[r]
        if r < new_height:
            w.next[r] = u.next[r]
            u.next[r] = w
        r = r - 1
    size = size + 1

Similarly, here is simplified pseudocode for searching in a skip list:

contains(x):
    u = sentinel
    r = max_height - 1
    while r >= 0 do
        while u.next[r] != NULL and u.next[r].data < x do
            u = u.next[r]
        r = r - 1
    return (u.next[0] != NULL) and (u.next[0].data == x)

Some of the iteration variables used above were chosen to match the textbook and aren't terribly descriptive, so here are some alternative names:

w       new_node        (skip_node_t*)
u       cur_node        (skip_node_t*)
r       cur_level       (int)

Part 1: Basic Set Operations

The following operations should all be O(1); i.e., their running time should not depend on the number of elements in the set.

  • void skip_set_init(skip_set_t *set)

    Initializes a new empty set.

  • size_t skip_set_size(skip_set_t *set)

    Returns the size of a set.

The following operations deal with adding, removing, and searching in the set, and they should all run in O(log n) time.

  • void skip_set_add(skip_set_t *set, data_t value)

    Adds an element to a set. If the set already contains the given value, no changes take place.

    IMPORTANT: The skip_set_add method is probably the most complex new method in PA3. Please study the pseudocode from the textbook carefully and consider how to port it to C code. You will need to perform all of the following tasks:

    1. Check to make sure the item does not already exist. (HINT: delegate this to "skip_set_contains")

    2. Choose a random height for the new node. (HINT: repeatedly flip a coin using "(rand() % 2 == 0)")

    3. Extend/replace the sentinel node if it is not already tall enough.

    4. Insert the new node at all appropriate levels of the skip list.

  • void skip_set_remove(skip_set_t *set, data_t value)

    Removes an element from a set. If the set does not contain the given value, no changes take place.

  • data_t skip_set_pop(skip_set_t *set)

    Returns and removes an arbitrary element from a set. The behavior is undefined if the set contains zero elements.

  • bool skip_set_contains(skip_set_t *set, data_t value)

    Returns true if the element is a member of the set; false otherwise.

The following operations clear all elements from the set. Because the skip list is a linked structure, these functions should run in O(n) time.

  • void skip_set_clear(skip_set_t *set)

    Removes all elements from a set.

  • void skip_set_free(skip_set_t *set)

    Deallocate all memory associated with a set.

Part 2: Iterators and Comparisons

The following operations handle iteration over a set. They should all be O(1).

  • skip_set_iter_t skip_set_begin(skip_set_t *set)

    Returns an iterator for a set that represents the beginning of the set.

  • skip_set_iter_t skip_set_end(skip_set_t *set)

    Returns an iterator for a set that represents the end of the set.

  • skip_set_iter_t skip_set_next(skip_set_iter_t iter)

    Advances and returns a set iterator.

The following operations handle comparisons between two sets. Because the underlying storage structure for this assignment supports O(log n) lookups, these methods should run in O(n log n) time.

  • bool skip_set_is_subset(skip_set_t *set1, skip_set_t *set2)

    Returns true if and only if set1 is a subset of set2.

  • bool skip_set_is_superset(skip_set_t *set1, skip_set_t *set2)

    Returns true if and only if set1 is a superset of set2.

  • bool skip_set_is_equal(skip_set_t *set1, skip_set_t *set2)

    Returns true if and only if set1 and set2 contain the same items.

  • bool skip_set_is_disjoint(skip_set_t *set1, skip_set_t *set2)

    Returns true if and only if set1 and set2 don't contain any of the same items.

Part 3: Mathematical Set Operations

The last few operations implement standard mathematical set operations. One thing that they all have in common is that they should populate a destination set with elements drawn from other sets. Thus, they all take a "dest" pointer to a set. Each function should ensure that the destination set is empty by clearing it before performing whatever operations are appropriate for that function.

The first function creates a copy of a set, and should run in O(n log n) time.

  • void skip_set_copy(skip_set_t *dest, skip_set_t *src)

    Make dest a copy of src; i.e., dest should contain all elements in src and no extra elements.

IMPORTANT: The sets should be independent after the copying! In other words, changes made to one of the sets after copying should not affect the other.

The following operations build new sets using two existing sets. Because the underlying storage structure for this assignment supports O(log n) lookups, these methods will require O(n log n) time.

  • void skip_set_union(skip_set_t *dest, skip_set_t *set1, skip_set_t *set2)

    Calculates the union of set1 and set2, storing it in dest.

  • void skip_set_intersect(skip_set_t *dest, skip_set_t *set1, skip_set_t *set2)

    Calculates the intersection of set1 and set2, storing it in dest.

  • void skip_set_diff(skip_set_t *dest, skip_set_t *set1, skip_set_t *set2)

    Calculates the difference between set1 and set2, storing it in dest.

  • void skip_set_sym_diff(skip_set_t *dest, skip_set_t *set1, skip_set_t *set2)

    Calculates the symmetric difference between set1 and set2, storing it in dest.

Performance Comparison

As discussed in class, core skip list operations (add/search/remove) are O(log n). This is significantly faster than the core dynamic array operations from PA2, which were O(n). You can (and should!) verify this speedup before you submit; if there is no speedup, you probably have not implemented the skip list properly.

To help you test, we are providing some example performance comparison code:

main.c

You will need to provide a working array set implementation in the same directory (both array_set.c and array_set.h; also, don't forget to add "array_set.o" to your Makefile!). You will get some warnings when you compile both of them at the same time:

In file included from main.c:7:0:
array_set.h:24:13: warning: redefinition of typedef ‘data_t’ [-Wpedantic]
typedef int data_t;
            ^
In file included from main.c:6:0:
skip_set.h:24:13: note: previous declaration of ‘data_t’ was here
typedef int data_t;

These warnings are a result of the fact that both array_set.h and skip_set.h define "data_t". You could eliminate this warning by moving that definition into a separate header (e.g., "data.h"), or you can just ignore the warnings.

Running the tests on a correct skip list implementation should give you output that looks similar to (but not exactly the same as!) this:

ARRAY ADD            1000   0.000107
ARRAY REMOVE         1000   0.000173
ARRAY CONTAINS       1000   0.000078
SKIP  ADD            1000   0.000183
SKIP  REMOVE         1000   0.000053
SKIP  CONTAINS       1000   0.000036

ARRAY ADD           10000   0.008619
ARRAY REMOVE        10000   0.015803
ARRAY CONTAINS      10000   0.006245
SKIP  ADD           10000   0.002696
SKIP  REMOVE        10000   0.001018
SKIP  CONTAINS      10000   0.000502

ARRAY ADD          100000   0.842092
ARRAY REMOVE       100000   1.574779
ARRAY CONTAINS     100000   0.633310
SKIP  ADD          100000   0.052600
SKIP  REMOVE       100000   0.021113
SKIP  CONTAINS     100000   0.006816

ARRAY ADD         1000000  85.464177
ARRAY REMOVE      1000000 161.946742
ARRAY CONTAINS    1000000  64.062070
SKIP  ADD         1000000   1.365875
SKIP  REMOVE      1000000   0.624748
SKIP  CONTAINS    1000000   0.099165

The first column is "ARRAY" or "SKIP" depending on which implementation is being tested. The second column shows the method being tested. The third column is the number of input values (randomly generated). The fourth column gives the running time (in seconds).

Your exact numbers will be different, of course, depending on how fast your computer is relative to the one used to generate the above results (and whether or not you are running in a VM). Regardless, it should be very easy to see that the skip list implementation runs considerably faster for large input sizes!

Final Submission

DUE DATE: Friday, October 30, 2015 at 23:59 EDT (11:59pm)

Submit ONLY your completed skip_set.c file on Canvas using the appropriate assignment submission. Make sure the file contains a comment field at the top with your name and a statement of compliance with the JMU honor code.

Please check your code carefully and make sure that it complies with the project specification. In addition, make sure that your code adheres to general good coding style guidelines. Fix any spelling mistakes, incorrect code formatting, and inconsistent whitespace before submitting. Make sure you include any appropriate documentation.