CS 470

Project 3 - MPI Application

Introduction

The primary purpose of this assignment is to gain experience with distributed computing using the MPI library.

Description

You will develop an optimized, distributed version of the following serial program: mergesort.c Your parallel version should have the filename par_mergesort.c.

The program generates a random series of numbers and then generates a histogram of the values before sorting them using merge sort. This assignment is similar to programming assignment 3.8 in your textbook.

The serial code contains three main functions. For this assignment, you must convert all three functions to be distributed as described below:

randomize() - To ensure that results are comparable with the serial version, all numbers should be generated on node 0 and should be efficiently split and transmitted in equal chunks to all nodes.
histogram() - Histograms should be calculated in a distributed fashion (i.e., every node should calculate the histogram for the data that it received during randomize()) and then efficiently aggregated back to node 0 for the output.
merge_sort() - You should implement a distributed merge sort using a fan pattern as shown in Figure 3.6 of the textbook. Use the built-in qsort function to sort the local numbers on each individual node before the merging process begins. Note that you may use the provided merge function if you wish, but you shouldn't need to use the serial merge_sort_helper function.

Your changes should largely be confined to the above functions. Keep main() as clean as you can; do not put significant program logic there. Use MPI collective operations where possible, but note that you will probably need point-to-point communication for at least one of the functions.

In your code documentation, you should include a discussion of your changes and the performance characteristics of your solution for each of the three main functions. Include experimental results and a thorough discussion of the computation and your parallelization for each of the three main functions named above. Here are some example questions that can guide your discussion:

Does your solution scale well? Does it exhibit strong or weak scaling (or both)? How does the performance compare to the serial version?
What trends do you see as you vary the number of MPI processes, the number of nodes involved, and the random number count? What do you think causes these trends?
Which operations are more compute-bound and which are more communication-bound? Do any operations shift characteristics during execution?

To verify correctness, the results (both the histogram and sorted list) must match those of the serial version (for corresponding input sizes). The histogram is already printed; you can enable printing of the sorted list by enabling DEBUG (do not try this for large input sizes or you will generate very large outputs!). Your parallel version should only print these results from rank 0.

For your final submission, DEBUG must be disabled. You may assume that the number of MPI processes is a power of two and that the number count n is divisible by the number of processes.

You should use MPI_Wtime() instead of clock() to time your MPI version. To ensure accurate timings, you must make sure all tasks have reached the timing statements before actually taking the measurements.

You must use the cluster to do development and testing. However, do not run your tests on the login node directly, as this will obstruct others and produce inaccurate timings. I suggest writing a SLURM job script that runs your program using various task and number counts. Recall that SLURM refers to MPI processes as tasks and that cluster nodes can support a maximum of eight processes/tasks per node.

Grading Criteria

Your submission will be graded on a combination of correctness, functionality, elegance, style, and documentation. Here is a tentative grading rubric:

Grade	Description	Requirements
A	Exceptional	Optimal distributed implementation of all three subroutines. Thorough, insightful discussion with scaling and characteristic analyses. Clean code with fully consistent style. Insightful documentation. All of the below.
B	Good	Correct distributed implementation of all three subroutines. Reasonable discussion of parallelization with performance analysis. Consistent and reasonable style. No memory leaks. Useful and consistent documentation. All of the below.
C	Satisfactory	Compiles without modifications. Correct timing code. Correct distributed implementation of at least two subroutines. Readable style. Minimal amount of documentation. All of the below.
D	Deficient	Some evidence of a good-faith attempt.
F	Unacceptable

Submission

You should copy mergesort.c into a new file called par_mergesort.c and modify it to implement the system described above. Your program must compile on compliant systems using the following Makefile:

default: mergesort par_mergesort

mergesort: mergesort.c
	gcc -g -O2 --std=c99 -Wall -o mergesort mergesort.c

par_mergesort: par_mergesort.c
	mpicc -g -O2 --std=c99 -Wall -o par_mergesort par_mergesort.c

clean:
	rm -f mergesort par_mergesort

You should submit your modified program as par_mergesort.c on Canvas by the due date. You should not submit your Makefile or anything else.

All submitted code should be elegant, clean, consistent, and well-documented. The code I have provided uses 4-space indentation and the 1TBS style. If you choose to reformat the file to match your own style preferences, make sure you apply the changes throughout.

Peer Reviews

One of the goals of this course is to encourage good software development practices, especially when building a parallel or distributed software system. For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

After the submission deadline, I will assign you two other submissions to review on Canvas. I expect you to review two distinct submissions that are not your own; I will do my best to ensure that the assignments work out but sometimes Canvas does not cooperate. It is your responsibility to inform me immediately if you have been assigned yourself, your partner, or two other same-team partners to review.

To complete the peer review, submit a Canvas comment on the submission with your assessment of the submission according to the provided code review guidelines. I expect at least two paragraphs per review with detailed observations and an assigned overall numeric score. You will be graded on the quality, thoroughness, and professional tone of your review. The peer reviews are due one week after the original project due date.