Sorting Lab

Objectives

The goal of this lab is to gain experience with the implementation of standard sorting algorithms.

Introduction

Download the lab starter files here:

Don't forget to download and customize your machine-specific makefile and add sort.o to the MODS line.

In the main.c file, you will find a lot of code infrastructure for testing sorting routines, both for correctness and for efficiency. The exact implementations of the testing framework in main.c are interesting but not really relevant for this lab. One particularly-interesting aspect is the use of function pointers.

(Optional) Function Pointers

A function pointer is exactly what it sounds like: a pointer to a function. This may seem a bit odd but after this much time working with pointers in C you sould be unsurprised to learn that you can have a pointer to a function.

After all, the compiler must store the compiled code for each function somewhere in memory, and that location must have an address. That address is essentially a pointer to the function. The C language allows you to manipulate such pointers just like any other pointers, although the syntax is a bit more complicated.

If you look in sort.h, you will find the following typedef:

typedef void(*sort)(data_t[], size_t);

This creates a typedef called sort that is a function pointer type. You'll notice that all of the sorting functions have the same general signature:

void bubble_sort    (data_t items[], size_t n);
void selection_sort (data_t items[], size_t n);
void insertion_sort (data_t items[], size_t n);
void merge_sort     (data_t items[], size_t n);
void quick_sort     (data_t items[], size_t n);

This allows us to write code that exercises a generic sort function:

void do_performance_test(data_t items[], size_t n, sort f)
{
    double sum_time = 0.0;

    // run several trials and average the results
    for (size_t i = 0; i < NUM_TRIALS; i++) {

        // (OMITTED: set up test and begin timer)

        f(citems, n);    // call the sort function

        // (OMITTED: clean up and update sum)
    }

    // print results
    printf(" %12lu %12.4f\n", n, (sum_time / NUM_TRIALS));
}

In the above code, f is a function pointer parameter to the do_performance_test function. Its value represents the address of a sort function, and the key benefit is that we can pass ANY sort function that matches the sort type into do_performance_test and it will work.

This allows us to store the sort function pointers in an array and iterate over them:

sort all_sorts[] = {
    bubble_sort,
    selection_sort,
    insertion_sort,
    merge_sort,
    quick_sort
};

...

void do_all_performance_tests(data_t items[], size_t n)
{
    // run every sort
    for (size_t i = 0; i < NUM_SORTS; i++) {
        do_performance_test(items, n, all_sorts[i]);
    }
}

Some of the above code has been simplified a bit for presentation here, but the actual code isn't much more complicated. We encourage you to look it over when you get a chance. Feel free to ask questions on Piazza if you see anything you don't understand.

Furthermore, if you are intrigued by this concept, we encourage you to explore programming languages that include robust support for first-class functions. You will also see this topic again in CS430 (Programming Languages).

For the purposes of this lab, however, you should focus your attention on writing the sorting routines in sort.c.

Swapping Items

You may find the following macro useful, and we have provided it at the top of sort.c:

#define SWAP(A,I,J) data_t tmp = (A)[(I)]; (A)[(I)] = (A)[(J)]; (A)[(J)] = tmp;

You can use this macro to swap two elements at indices i and j in array items, using code like this:

    SWAP(items, i, j);

Comparing Items

We have provided the following function to compare data_t items:

int cmp(data_t a, data_t b)
{
    return a - b;
}

This function compares data values and returns a value that can be compared against zero to determine the order of the parameters:

(cmp(a,b)  < 0)  if  a < b
(cmp(a,b) == 0)  if  a = b
(cmp(a,b)  > 0)  if  a > b

For instance, instead of writing this:

    if (items[i] < items[j]) do_something();

You should write this:

    if (cmp(items[i], items[j]) < 0) do_something();

This kind of function is sometimes called a comparator, and it should remind you of iterators in the sense that it abstracts away underlying implementation details. Later (in PA 4), we will change data_t to be a pointer to a record data type. If you write your sorting functions to use the cmp comparison function, you will only need a new implementation for the cmp function and your sorting routines should still work!

Exercises

  • Write the three "slow" sorts we discussed in class: selection sort, insertion sort, and bubble sort. Put the implementations in the corresponding functions in sort.c. Recall that all of these sorts basically consist of two nested for-loops (resulting in O(n2) running times); don't overcomplicate them! Run the tests to make sure your implementations are correct and that their performance matches your expectations.

  • Write a merge-sort routine in merge_sort. You should allocate new arrays for each level of the recursion, and you should make sure you do not leak any memory. You probably won't need a recursive helper function for this implementation. You may refer to the textbook or class slides for pseudocode. Your solution should run in O(n log n) worst-case time.

  • Write a quicksort routine in quick_sort. You should implement an in-place version of quicksort, meaning that you shouldn't allocate any additional arrays. Depending on how you implement partitioning, you may or may not need a recursive helper function. Additionally, you should choose a random element from the list to use as a pivot value at each level, and you may assume that the standard random number generator has already been seeded. You may refer to the textbook or class slides for pseudocode. Your solution should run in O(n log n) expected time.

(Optional) Adding new sorts

Because we used function pointers to build the testing framework for this lab, it is relatively easy for you to add new sorting routines. We encourage you to experiment and come up with your own sorting algorithms and implement them, comparing their behavior and performance to the standard algorithms.

Here are the steps you'll need to follow to add a new sort routine. First, add the new function prototype to sort.h:

/*
 * Sort an array using my new super-awesome sort.
 */
void my_sort(data_t items[], size_t n);

Then, implement your sort in sort.c:

void my_sort(data_t items[], size_t n)
{
    // your super-awesome code goes here
}

Finally, update main.c. You'll need to change three places:

  1. Increment the NUM_SORTS counter:
const size_t NUM_SORTS = 6;     // used to be 5
  1. Add your sorting routine to the all_sorts list:
sort all_sorts[] = {
    bubble_sort,
    selection_sort,
    insertion_sort,
    merge_sort,
    quick_sort,
    my_sort             // added
};
  1. Add a label for your sort (to be printed in the output) to the sort_labels list:
char* sort_labels[] = {
    "Bubble",
    "Selection",
    "Insertion",
    "Merge",
    "Quick",
    "Awesome Sort"      // added
};

That's all there is to it. You can now run the program and it will test your new sorting algorithm for correctness and performance. Can you outperform the standard sorting algorithms?