CS 470

Project 2 - OpenMP Application

Introduction

The primary purpose of this assignment is to gain experience with multicore parallelism using the OpenMP library. The secondary purpose is to gain experience parallelizing dense matrix operations.

Description

You will develop an optimized, parallel version of the following serial program: gauss.c Your parallel version should have the filename par_gauss.c.

The program generates a random system of linear equations in the form of a coefficient matrix (A) and a right-hand-side column vector (b). The size of the system is given as a command-line parameter to the program. It then solves the system using Gaussian elimination and backwards substitution. The system is generated in such a way the solution (x) will be a column vector with all values equal to 1.0, making it easy to check the answers. This assignment is similar to programming assignments 5.4 and 5.5 in your textbook.

The serial code contains several functions as well as a few preprocessor directives to toggle program features on/off. For this assignment, you must efficiently parallelize (using OpenMP) all functions that contribute significantly to the running time of the program. You should not use Pthreads, semaphores, or any other concurrency/parallelization system. You may perform minor re-writes within functions in order to be able to more effectively parallelize them, but you should avoid any rewrites that cut across multiple functions or that fundamentally change the nature of the function.

In your code documentation, you should include a discussion of your changes and why you made them. This assignment is far more open-ended than the first project, and the amount of code you actually write could be rather small in the end. Thus, you must include documentation explaining your experiments and the reasoning behind your final submission. You should include some performance results and an argument as to why your version is the optimal way to parallelize the original code. IMPORTANT: "I tried it and it didn't run faster" is not on its own a sufficient explanation! If you were unable to parallelize a particular function, explain what you tried and discuss reasons why it may not have worked well. If possible, suggest ways to improve the code to increase parallelism.

For your final submission, DEBUG and GENERATE_TRIANGULAR must be disabled, and REAL must be set to "double"!

Use the DEBUG directive to toggle debug output. This output will likely be unhelpful for matrices larger than 10x10. Use the GENERATE_TRIANGULAR directive to generate the coefficient matrix already in upper-triangular form, removing the need to do Gaussian elimination. Use the USE_COLUMN_BACKSUB directive to enable the alternative back substitution method (column-oriented vs. the standard row-oriented). Finally, you can change the REAL macro if you wish to experiment with using single precision instead of double precision.

You should use omp_get_wtime instead of clock() to time your OpenMP version.

You should use compiler directives to disable all OpenMP code if the compiler does not support it. (HINT: check for "_OPENMP") I suggest creating a Makefile that builds three different programs: 1) the original serial version, 2) the new parallel version, and 3) the new parallel version with OpenMP disabled.

Your program should respect the OMP_NUM_THREADS environment variable; there is no need to accept the number of threads as a command-line parameter.

You may use the cluster to do development and testing. However, please be considerate and do not run your tests on the login node directly. I suggest writing a test script that runs your program using various thread counts, then converting this script into a SLURM job script so that it can run on the compute nodes.

If you would like to test with some custom (non-randomized) matrices, you can pass a file to the program instead of a system size. I have supplied a function (read_system) that handles the file I/O. The files should contain a line at the top with a single integer n (the size of the system). Subsequent lines should have rows of the augmented matrix [A][b].

For example, here is the representation of this system (taken from the Wikipedia article):

3
 3.0  2.0 -1.0  1.0
 2.0 -2.0  4.0 -2.0
-1.0  0.5 -1.0  0.0

The solution to this system is [ 1.0 -2.0 -2.0 ].

HINT: Test with larger matrices! You won't begin to see true performance trends until you start testing on matrices larger than 500x500. I suggest testing with even larger matrices (2000x2000 or 5000x5000 or more).

Grading Criteria

Your submission will be graded on a combination of correctness, functionality, elegance, style, and documentation. Here is a tentative grading rubric:

Grade	Description	Requirements
A	Exceptional	Consistently optimal speedup for all significant subroutines OR a thorough discussion of your methods and and explanation or conjecture about why the routine couldn't be parallelized. Clean code with fully consistent style. Insightful documentation with thorough explanation of changes and performance analysis. All of the below.
B	Good	Compiles without modifications even without OpenMP enabled. Consistently optimal speedup and thorough discussion on two subroutines. Consistent and reasonable style. No memory leaks. Useful and consistent documentation. All of the below.
C	Satisfactory	Compiles without modifications. Consistently optimal speedup and discussion on one subroutine. Readable style. Minimal amount of documentation. All of the below.
D	Deficient	Some evidence of a good-faith attempt.
F	Unacceptable

Submission

You should copy gauss.c into a new file called par_gauss.c and modify it to implement the system described above. Your modified program should use the OMP_NUM_THREADS environment variable for scaling. Your program must compile on compliant systems using the following Makefile:

default: gauss par_gauss

gauss: gauss.c
	gcc -g -O2 --std=c99 -Wall -o gauss gauss.c

par_gauss: par_gauss.c
	gcc -g -O2 --std=c99 -fopenmp -Wall -o par_gauss par_gauss.c

clean:
	rm -f gauss par_gauss

You should submit your modified program as par_gauss.c on Canvas by the due date. You should not submit your Makefile or anything else.

All submitted code should be elegant, clean, consistent, and well-documented. The code I have provided uses 4-space indentation and the 1TBS style. If you choose to reformat the file to match your own style preferences, make sure you apply the changes throughout.

Peer Reviews

One of the goals of this course is to encourage good software development practices, especially when building a parallel or distributed software system. For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

After the submission deadline, I will assign you two other submissions to review on Canvas. I expect you to review two distinct submissions that are not your own; I will do my best to ensure that the assignments work out but sometimes Canvas does not cooperate. It is your responsibility to inform me immediately if you have been assigned yourself, your partner, or two other same-team partners to review.

To complete the peer review, submit a Canvas comment on the submission with your assessment of the submission according to the provided code review guidelines. I expect at least two paragraphs per review with detailed observations and an assigned overall numeric score. You will be graded on the quality, thoroughness, and professional tone of your review. The peer reviews are due one week after the original project due date.

Parallel and Distributed Systems - Spring 2016