CS 470

Project 1 - Pthread Application

Introduction

The primary purpose of this assignment is to gain experience with multicore parallelism using the standard POSIX Pthreads library, including exposure to multithreading concepts such as mutexes and conditions. The secondary purpose of this assignment is to gain experience implementing a master/worker task queue model of parallel computation.

Description

You will develop a parallel version of the following serial program: sum.c

The program reads a list of "tasks" from a file. Each task consists of a character code indicating an action and a number. The character code can be either a "p" (for "process") or "w" (for "wait"). The input file simulates various workloads entering a multiprocessing system. In a real system, the "p" actions (the tasks) would likely be calls to computational routines. For our purposes, "processing" a task with number n just means waiting n seconds using the sleep function and then updating a few global aggregate variables (sum, odd count, min, and max). The "w" action provides a way to simulate a pause in incoming tasks.

For example, the following script simulates one initial one-second task entering the system, followed by a two-second delay. After the delay, a two-second task enters the system followed by a three-second task.

p 1
w 2
p 2
p 3

Using a purely serial processing system (as implemented in the provided sum.c), the above scenario will take eight seconds to finish:

t=0    t=1    t=2    t=3    t=4    t=5    t=6    t=7    t=8
|      |      |      |      |      |      |      |      |
---------------------------------------------------------
| p 1  | w 2         | p 2          | p 3               |       serial version
---------------------------------------------------------

The final output should match the following (sum, # odd, min, max):

6 2 1 3

In this project you will extend this program to take advantage of a multicore CPU using a task queue model. In such a model, the main program spawns a set number of worker threads. You should read the number of threads from the command line as a new second parameter. The main program and worker threads communicate using a task queue to keep track of tasks that still need to be processed.

Observe that if we allowed the first task to be processed during the wait, we could reduce the time by a single second:

t=0    t=1    t=2    t=3    t=4    t=5    t=6    t=7
|      |      |      |      |      |      |      |
--------------------------------------------------
| w 2         | idle                             |      master
--------------------------------------------------
| p 1  | idle | p 2         | p 3                |      single worker
--------------------------------------------------

This can be achieved by splitting the actual processing work out into a worker thread that can work parallel to the original master thread. This allows the master thread to focus on receiving jobs while the worker thread focuses on doing the actual work. In the above scenario, the first task arrives at t=0 while the second two tasks arrive simultaneously at t=2.

Because our tasks are independent, the situation can be further improved by the addition of more worker threads, assuming we have enough physical CPU cores to take advantage of them. In the example above, the last two tasks could be executed simultaneously to save two additional seconds:

t=0    t=1    t=2    t=3    t=4    t=5
|      |      |      |      |      |
------------------------------------
| w 2         | idle               |    master
------------------------------------
| p 1  | idle | p 2         | idle |    worker 1
------------------------------------
| idle        | p 3                |    worker 2
------------------------------------

Your program should work as follows. At the beginning of execution, the master thread spawns a set number of worker threads (given by a command line parameter--if the user enters a non-positive number, simply print an error and exit). The worker threads are idle at first. Once the workers have been fully initialized, the master then begins to handle tasks from the input file by adding them to a task queue, waking up an idle worker thread (if there are any) for each task. When a thread is awakened, they begin to pull tasks from the queue and process them. If the queue ever runs out of tasks, the worker should block again until awakened by the master. If the master encounters a "w" (wait) command, it waits the given number of seconds before continuing in the input file. Otherwise, the master should continue to add "p" tasks with no pause or delay. After all tasks have been added to the queue, the master waits for the queue to be exhausted and for non-idle workers to finish. The master then cleans everything up (nicely! do not forcibly abort worker threads) and exits.

To implement the above system, you should use Pthread threads, mutexes, and conditions as covered in class and in the textbook. You may not use Pthread barriers as they are not C99-compliant. Your program should take the number of worker threads as a second command-line parameter; the performance on parallelizable work loads should scale linearly with the number of threads. Do NOT modify the output format. You must also keep an explicit task queue data structure that is dynamically allocated so that it can grow as necessary to adapt to large numbers of incoming tasks (up to a million is reasonable).

Hints:

Add a new worker thread function for most of your worker-side synchronization.
Implement the task queue as a singly-linked list.
Use a mutex to protect the global aggregate variables.
Use a mutex to protect the task queue.
Use a condition to block and wake worker threads. Think carefully about which mutex should be used to protect it.
Declare variables that you don't want the compiler to optimize using the volatile keyword.
It's ok for the master thread to busy-wait; however, the workers should not!
Compile with -O0 to debug, then switch to -O2 after you're sure the basics are working.
Create several test input files and a script to automatically run (and time) all of them.
Try removing the call to "sleep" in the update function to compress the work times and check for race conditions.
Don't over-engineer! My reference solution adds fewer than 200 total lines.

Grading Criteria

Your submission will be graded on a combination of correctness, functionality, elegance, style, and documentation. Here is a tentative grading rubric:

Grade	Description	Requirements
A	Exceptional	Consistently optimal running times. Correct results on all test cases at all optimization levels. Correct use of all Pthread constructs. Lack of any potential data races. Lack of busy-waiting in worker threads. Error checking for all Pthreads function calls. Clean code with fully consistent style. Insightful documentation. All of the below.
B	Good	Speedup with multiple worker threads. Terminates on all test cases at all optimization levels. Correct implementation of a task queue. Correct use of multiple Pthread synchronization mechanisms. Consistent and reasonable style. No memory leaks. Useful and consistent documentation. All of the below.
C	Satisfactory	Compiles without needing modifications on the cluster with `--std=c99`. Correct results with a single worker thread. Correct use of at least one Pthread synchronization mechanism. Readable style. Minimal amount of documentation. All of the below.
D	Deficient	Some evidence of a good-faith attempt.
F	Unacceptable

Submission

You should copy sum.c into a new file called par_sum.c and modify it to implement the master/worker system described above. Your modified program should accept TWO command-line parameters instead of one: 1) the input filename and 2) the number of worker threads. Your program must compile on the cluster using the following Makefile:

default: sum par_sum

sum: sum.c
	gcc -g -O2 --std=c99 -Wall -o sum sum.c

par_sum: par_sum.c
	gcc -g -O2 --std=c99 -Wall -o par_sum par_sum.c -lpthread

clean:
	rm -f sum par_sum

You should submit your modified program as par_sum.c on Canvas by the due date. You should not submit your Makefile or anything else.

All submitted code should be elegant, clean, consistent, and well-documented. The code I have provided uses 4-space indentation and the 1TBS style. If you choose to reformat the file to match your own style preferences, make sure you apply the changes throughout. See the coding standards for suggested style guidelines.

Peer Reviews

One of the goals of this course is to encourage good software development practices, especially when building a parallel or distributed software system. For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

After the submission deadline, I will assign you two other submissions to review on Canvas. I expect you to review two distinct submissions that are not your own; I will do my best to ensure that the assignments work out but sometimes Canvas does not cooperate. It is your responsibility to inform me immediately if you have been assigned yourself, your partner, or two other same-team partners to review.

To complete the peer review, submit a Canvas comment on the submission with your assessment of the submission according to the provided code review guidelines. I expect at least two paragraphs per review with detailed observations and an assigned overall numeric score. You will be graded on the quality, thoroughness, and professional tone of your review. The peer reviews are due one week after the original project due date.