Final Project Overview
Goals
- Apply course concepts to a larger, open-ended project.
- Gain parallel/distributed systems experience that you can draw from in your career.
- Improve your confidence in dealing with complex and ill-formed problems.
- Exercise communication skills in relaying and explaining your work.
Description
In this project, you will gain experince working on a larger, more open-ended project. You will find an existing parallel or distributed software system that uses at least one of the technologies we are studying (Pthreads, OpenMP, CUDA, or MPI) and work on it over the course of the semester:
- Analyze the system. This will first require you to get the existing code to compile and run on the cluster or JMU lab machines, which is not always trivial. You must also verify that the system is working as it should (e.g., correctness). You should also study the code and attempt to understand the overall system architecture (draw diagrams!). Then, you must analyze the performance and scaling of the system by studying the code and running performance tests. If possible, you should run code profilers to find hot spots, and test parallelism along multiple dimensions. You must then document all of this and write it up in a poster and a short (5-10 page) report.
- Find insights. After you're familiar with the system, work to extract interesting and useful observations. This should involve showing a significant tradeoff with the system or relating performance results back to a particular attribute or construct in the code. You must be able to demonstrate the insight via experimental results. Again, you must document all of this, focusing on being able to clearly explain and defend your claims with evidence in both the poster and your report.
- Make a contribution. After you have gained insights, you should be well-positioned to make a novel contribution to the system. You might fix a problem identified in your analysis, optimize the system to significantly improve performance, add a new feature, or adapt the system to solve a different kind of problem than it was originally intended to address. As before, you must document your contribution, making it clear what value you have added and why it is important or significant.
Guidelines
Below are some general guidelines for the semester-long project:
- Find a topic you are personally interested in that is related to our course topics.
- Find two or three like-minded students in the class to work with.
- Start early and schedule regular work sessions to make steady progress.
- Avoid naturally-parallel problems if possible; try to find something non-trivial.
- Unless otherwise approved, the software must run on our cluster or lab machines.
- Unless otherwise approved, the majority of the code must be written in C or C++.
- Prefer large software systems (e.g., thousands of lines of code or more).
- Prefer distributed software systems (e.g., MPI-based or networked).
Grading
It is impossible to build a comprehensive grading rubric for these projects because they are highly individualized, and not every group need aim for an 'A' grade to be successful. However, here are some general guidelines for a successful project at each grade level:
- A level (novelty) - This project demonstrates a significant and well-executed new implementation and analysis of a parallel or distributed system, in addition to the criteria for all lower grading tiers. This might include a novel system or variation on an existing system, and it should apply many of the concepts discussed in class. The writeup will include a thorough analysis and discussion of performance and scaling characteristics as well as any other relevant topics. In general, a project like this is a good candidate for a pull request to contribute your additions back to the original project.
- B level (insight) - This project demonstrates significant insights into a parallel or distributed system. This may or may not include new code, but it should apply many of the concepts discussed in class, and in most cases should involve demonstrating an interesting and significant tradeoff. The writeup will include a thorough analysis and discussion of performance and scaling characteristics as well as any other relevant topics. In general, a project like this generates lively discussions with judges at the poster session.
- C level (parallelism) - This project demonstrates a parallel or distributed system, using at least two or three concepts discussed in class. The writeup will include a discussion of performance and scaling characteristics as well as any other relevant topics.
I would also suggest reading the final deliverable grading rubric as well as the poster guidelines as a guide to what I will be expecting for those submissions, which account for the majority of the points associated with this project.
Ideas
Below is a list of kinds of software systems that are likely to work well for the semester-long project.
- Physics simulations
- Weather/climate simulations
- Chemical or biochemical simulations
- Numerical systems solvers
- Dense linear algebra systems
- Ray tracers or game engines
- Graph problem solvers
- Image processors
Note: In assigning a final grade, I will take into account the size and complexity of the software system you chose. All other things being equal, a project involving a very large system that is distributed or uses multiple technologies that we covered in the class will likely recieve a higher score than a system that is smaller or uses fewer technologies. This is intended to reward the higher difficulty associated with choosing a more complex target.
Closing
Most importantly: DON'T PANIC! This project has in the past caused a lot of undue stress primarily because it is so open-ended. However, succeeding at an open-ended project is a skill that will likely be crucial at some point in your career. This is an opportunity to work on that skill in a relatively low-stakes context. I encourage you to think of this project as an opportunity rather than an obstacle.
Good luck!