Project 3 - MPI Application
Introduction
The primary purpose of this assignment is to gain experience with distributed computing using the MPI library.
Description
You will develop an optimized, distributed version of the following serial
program: mergesort.c
Your parallel
version should have the filename par_mergesort.c
.
The program generates a random series of numbers and then generates a histogram of the values before sorting them using merge sort. This assignment is similar to programming assignment 3.8 in your textbook.
The serial code contains three main functions. For this assignment, you must convert all three functions to be distributed as described below:
-
randomize()
- To ensure that results are comparable with the serial version, all numbers should be generated on node 0 and should be efficiently split and transmitted in equal chunks to all nodes. -
histogram()
- Histograms should be calculated in a distributed fashion (i.e., every node should calculate the histogram for the data that it received duringrandomize()
) and then efficiently aggregated back to node 0 for the output. -
merge_sort()
- You should implement a distributed merge sort using a fan pattern as shown in Figure 3.6 of the textbook. Use the built-inqsort
function to sort the local numbers on each individual node before the merging process begins. Note that you may use the providedmerge
function if you wish, but you shouldn't need to use the serialmerge_sort_helper
function.
Your changes should largely be confined to the above functions. Keep
main()
as clean as you can; do not put significant program logic
there. Use MPI collective operations where possible, but note that you will
probably need point-to-point communication for at least one of the functions.
In your code documentation, you should include a discussion of your changes and the performance characteristics of your solution for each of the three main functions. Include experimental results and a thorough discussion of the computation and your parallelization for each of the three main functions named above. Here are some example questions that can guide your discussion:
- Does your solution scale well? Does it exhibit strong or weak scaling (or both)? How does the performance compare to the serial version?
- What trends do you see as you vary the number of MPI processes, the number of nodes involved, and the random number count? What do you think causes these trends?
- Which operations are more compute-bound and which are more communication-bound? Do any operations shift characteristics during execution?
To verify correctness, the results (both the
histogram and sorted list) must match those of the serial version (for
corresponding input sizes). The histogram is already printed; you can
enable printing of the sorted list by enabling DEBUG
(do not try
this for large input sizes or you will generate very large outputs!). Your
parallel version should only print these results from rank 0.
For your final submission, DEBUG
must
be disabled. You may assume that the number of MPI processes is a
power of two and that the number count n is divisible by the number of
processes.
You should use MPI_Wtime()
instead of clock()
to
time your MPI version. To ensure accurate timings, you must make sure all tasks
have reached the timing statements before actually taking the measurements.
You must use the cluster to do development and testing. However, do not run your tests on the login node directly, as this will obstruct others and produce inaccurate timings. I suggest writing a SLURM job script that runs your program using various task and number counts. Recall that SLURM refers to MPI processes as tasks and that cluster nodes can support a maximum of eight processes/tasks per node.
Grading Criteria
Your submission will be graded on a combination of correctness, functionality, elegance, style, and documentation. Here is a tentative grading rubric:
Grade | Description | Requirements |
---|---|---|
A | Exceptional |
|
B | Good |
|
C | Satisfactory |
|
D | Deficient |
|
F | Unacceptable |
Submission
You should copy mergesort.c
into a new file called
par_mergesort.c
and modify it to implement the system described
above. Your program must compile on compliant systems using the following
Makefile:
default: mergesort par_mergesort mergesort: mergesort.c gcc -g -O2 --std=c99 -Wall -o mergesort mergesort.c par_mergesort: par_mergesort.c mpicc -g -O2 --std=c99 -Wall -o par_mergesort par_mergesort.c clean: rm -f mergesort par_mergesort
You should submit your modified program as par_mergesort.c
on
Canvas by the due date. You should not submit your Makefile or anything else.
All submitted code should be elegant, clean, consistent, and well-documented. The code I have provided uses 4-space indentation and the 1TBS style. If you choose to reformat the file to match your own style preferences, make sure you apply the changes throughout.
Peer Reviews
One of the goals of this course is to encourage good software development practices, especially when building a parallel or distributed software system. For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.
After the submission deadline, I will assign you two other submissions to review on Canvas. I expect you to review two distinct submissions that are not your own; I will do my best to ensure that the assignments work out but sometimes Canvas does not cooperate. It is your responsibility to inform me immediately if you have been assigned yourself, your partner, or two other same-team partners to review.
To complete the peer review, submit a Canvas comment on the submission with your assessment of the submission according to the provided code review guidelines. I expect at least two paragraphs per review with detailed observations and an assigned overall numeric score. You will be graded on the quality, thoroughness, and professional tone of your review. The peer reviews are due one week after the original project due date.