There is no single best sorting algorithm. Quicksort is the fastest
known comparison-based sorting algorithm when applied to large,
unordered, sequences. It also has the advantage of being an in-place
(or nearly in-place) sort. Unfortunately, quicksort has some
weaknesses: it’s worst-case performance is
What we really want is a sorting algorithm that is as fast as
quicksort, stable, in-place, with
The objective of this assignment is to write modified versions of merge and quicksort that exhibit better performance than the standard versions described in our textbook. You will implement three improvements (two for merge sort and one for quicksort) that should improve some aspect of the algorithm’s performance without requiring a great deal of additional code. You will also experimentally evaluate the impact of your improvements.
The following zip file contains starter code, including the sort implementations provided by our textbook.
This zip file has the following contents:
BasicSorts.java
- Includes selection sort and insertion sort.MergeSort.java
- Includes merge sort.QuickSort.java
- Includes quicksort.MergeSortsImproved.java
(UNFINISHED. See Parts 1 & 2 below.)IntroSorter.java
(UNFINISHED. See Part 3 below.)SortProfiler.java
Sorter.java
- Functional Interface for sorting methods.Generator.java
- Functional Interface for generator methods.Generators.java
- Static methods for generating data withThis zip file contains a set of unit tests for the modified sorts:
Note that SortProfiler
relies on the JOpt Simple library for
handling command-line arguments. You will need to have
jopt-simple-5.0.4.jar
in your classpath to use the application.
The merge algorithm described in our textbook consists of the following two stages:
1. Copy all values from the two merge-sorted halves into a temporary array.
2. Merge the values from the temporary array back into the original array.
If
1. Copy the values from the first merge-sorted half to a temporary array.
2. Merge the values from the temporary array and the second
merge-sorted half into the original array.
Here are some ASCII graphics illustrating the process:
Original sorted sub-arrays:
________________________________________
...| 1 | 3 | 5 | 7 | 9 | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
start-^ mid-^ end-^
Temporary Array
___________________
| | | | | |...
-------------------
0-^
________________________________________
...| 1 | 3 | 5 | 7 | 9 | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
start-^ mid-^ end-^
Copy the sorted first half into the temporary array:
___________________
| 1 | 3 | 5 | 7 | 9 |...
-------------------
________________________________________
...| | | | | | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
Initialize i1
to be the index of the first position (0) in the
temporary array, i2
to be the index of the first position of the
sorted right half, and curr
to be the next available position for
merging:
___________________
| 1 | 3 | 5 | 7 | 9 |...
-------------------
i1-^
________________________________________
...| | | | | | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
curr-^ i2-^
Since temp[i1] < items[i2]
, temp[i1]
is copied to items[curr]
.
i1
and curr
are incremented.
___________________
| 1 | 3 | 5 | 7 | 9 |...
-------------------
i1-^
________________________________________
...| 1 | | | | | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
curr-^ i2-^
Since items[i2] < temp[i1]
, items[i2]
is copied to items[curr]
.
i2
and curr
are incremented.
___________________
| 1 | 3 | 5 | 7 | 9 |...
-------------------
i1-^
________________________________________
...| 1 | 2 | | | | 2 | 4 | 6 | 8 | 10 |...
----------------------------------------
curr-^ i2-^
Continue Merging until complete:
___________________
| 1 | 3 | 5 | 7 | 9 |...
-------------------
________________________________________
...| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |...
----------------------------------------
Use the sort profiling tool to test your finished implementation. You
should not expect your improved merge sort to be dramatically faster
than the provided implementation. Since array assignments are very
fast relative to calls to compareTo
, the speed impact will be
minimal. The main advantage here is the savings in space.
The next improvement is based on the observation that merge sort is
actually slower than simple
As you can see, insertion sort is faster until around
A a reminder, the following pseudocode describes the overall logic of the merge sort Algorithm:
merge_sort(sub-array)
If sub-array is has more than one entry:
Recursively merge_sort the left half
Recursively merge_sort the right half
Merge the two sorted halves.
This logic recursively splits the original array into smaller and smaller sub-arrays until the recursion bottoms out at sub-arrays of size one. This means that every time a large array is sorted, there are many recursive calls to merge sort that have small input sizes. In light of the figure above, that approach doesn’t make much sense: merge sort is not a competitive sorting algorithm on small inputs. It would make more sense to recursively break the input into smaller and smaller pieces until some threshold is reached, and then switch strategies to a sorting algorithm that is more efficient on those small inputs.
The following pseudocode describes this alternate approach:
merge_sort(sub-array)
If sub-array has fewer than MERGE_SORT_THRESHOLD entries:
Sort the sub-array with insertion sort.
Otherwise:
Recursively merge_sort the left half
Recursively merge_sort the right half
Merge the two sorted halves.
Choosing an appropriate value for MERGE_SORT_THRESHOLD
requires some
experimentation. One of the requirements of this assignment is that
you select an appropriate threshold value and provide data to justify
your choice. Simply selecting the crossover point based on the figure
above is not the right way to think about this. The point where the
two lines cross represents the input size where the two sorts are
equally fast. Switching strategies at that input size will result
in no speedup at all. The only way to determine an appropriate choice
for MERGE_SORT_THRESHOLD
is to systematically experiment with
different values until you find the one that leads to the best overall
sorting performance.
Finally, note that we have provided an implementation of insertion sort that works on a full array, but you will need a version that works on a sub-array for this part of the assignment.
As mentioned above, quicksort is usually very fast. There are
relatively few pathological inputs that result in
It is possible to detect pathological inputs by tracking the recursion
depth of the quicksort algorithm. When quicksort is working well,
the partition operation typically moves the pivot item somewhere near
the center of the current sub-sequence. When this is the case, the
maximum recursion depth will be
Note that the provided version of merge sort sorts a full
array. Introspective sort needs to be able to call a version of merge
sort that can sort a portion of a provided array. The
MergeSortImproved
class provides a declaration for such a method.
So far in this course we have focused on asymptotic analysis. This usually gives us the information we want. When deciding between two algorithms we generally prefer the one with better asymptotic performance.
Our goals for this project are somewhat different. The focus now is on
fine-tuning existing algorithms to improve their performance. This is
a situation where we really care about the constant factors. (We
would be very happy if we could write a version of merge sort that is
twice as fast, even if the asymptotic worst-case performance were still
In order to help you evaluate your sorting improvements we are providing a command-line tool that can be used to systematically evaluate the run-time performance of different sorting algorithms.
Running the tool with no arguments should produce the following help message:
Option (* = required) Description
--------------------- -----------
* -s <Integer: NUMBER> Starting (smallest) input size
* -i <Integer: NUMBER> Input size increment
* -m <Integer: NUMBER> Maximum input size to test
* -t <Integer: NUMBER> Number of trials for each input size
-w [String: SORT1,SORT2,...] Comma separated list of sorts. Options include
insertion, selection, merge, merge1, merge2,
quick, introspective and timsort. Default is to
execute all sorts.
-g [String: GENERATOR] Sequence generator. Options include random,
ordered or evil. The default is random
For example, we could compare the performance of quicksort and merge sort on small inputs by executing the tool as follows:
$ java SortProfiler -s 1000 -i 1000 -m 10000 -t 10 -w quick,introspective
The resulting output would look something like the following:
N, quick, introspective
1000, 0.00009277, 0.00000085
2000, 0.00021152, 0.00000086
3000, 0.00030850, 0.00000088
4000, 0.00043710, 0.00000090
5000, 0.00055941, 0.00000088
6000, 0.00068144, 0.00000087
7000, 0.00081457, 0.00000091
8000, 0.00095075, 0.00000087
9000, 0.00105827, 0.00000089
10000, 0.00117701, 0.00000087
(Introspective sort is very fast here because it hasn’t been implemented yet.)
For this part of the project you will prepare a short document that quantifies the performance improvements you were able to achieve. This document should take the form of the following figures:
mergeSort1
to the original merge sort
implementation as well as quicksort. Results should be shown for
both random and ordered inputs.mergeSort2
. (The SortProfiler
tool will not generate
this data.)mergeSort2
to the previous two merge sort implementations as well
as quicksort. Results should be shown for both random and ordered
inputs.Each figure should have clearly labeled axes and appropriate legends. Each figure should also have a caption describing the results. There should be enough information in the captions for the reader to understand the point of the experiment and how the data supports the conclusion.
Submit all of the sorting classes through Autolab. The .zip file should also contain a .pdf document containing your written analysis.
Autolab Tests for Part 1 | 20% |
Autolab Tests for Part 2 | 20% |
Autolab Tests for Part 3 | 20% |
Experimental Results Document | 20% |
Autolab Style Checks | 10% |
Instructor Style Points | 10% |
Any submission that doesn’t pass all of the instructor unit tests will receive at most 50% of the points for that component of the grade (30/60).
The grading of your evaluation document will be based on the quality of the figures as well as the clarity of your writing.
*The second approach has recently become popular. Both Python (starting in version 2.3) and Java (starting in version 7) use Timsort as their default sorting algorithm. Timsort is a modified version of merge sort that includes several enhancements: it uses a more space-efficient merge operation, it takes advantage of partially sorted arrays by finding and merging existing sorted regions, and it uses a non-recursive binary insertion sort to handle short unsorted regions.