This is the multi-page printable view of this section. Click here to print.
Homework
1 - NP Complete Project
Learning Objectives
- implement a brute-force approach to solve a known NP-complete problem and analyze its runtime complexity
- present/discuss an NP-completeness proof to a group of colleagues and justify that an approximation method is appropriate
- create an approximation method for a known NP-complete problem and present its strengths and weaknesses.
- improve presentation skills (both speaking and slide creation) by giving and receiving peer feedback
Introduction
In this project you will explore a known NP-complete problem and perform the following tasks:
- Develop an optimal solution – develop code that solves the problem with the optimal answer (no approximation).
- Problem presentation explain the problem and its reduction to another accepted NP-complete problem.
- Approximation code – develop code that performs a polynomial approximation
- Approximation presentation – explain your approach for computing the approximation and illustrates its performance
- Evaluation of Other Projects – you will submit one peer review of another group’s project.
The sections below detail each of these components. Individual assignments will be created for all submission pieces (or they will be submitted in Github and an assignment will be created in Canvas to represent your grade).
Problem and Group Selection
This is a group project that can have up to 2 members.
It may be necessary to work in a group of 3, and if so
a group of this size will be required to complete 2 different approximations (Part C).
Your group will select and rank three NP-complete problems that you would like
to work on for this project from the list below.
You should do some research before selecting one of these problems. Ideally, we will get a nice distribution of projects across these problems.
- Max 3-Sat
- Max clique
- Traveling Saleman Problem (TSP) with a complete graph
- Longest Path
- Min Vertex Cover
- Min Graph Coloring
- Min Set Cover
I have created example input and output files for each problem. After I receive and review all the group preferences, I will assign each group a single NP-complete problem to work on.
Role Selection and Tasks
Each of the tasks for the project are outlined below. One person in the group will be responsible for Parts A and B, and the other person will perform Parts C and D.
Part A - Exact Solution Code
Develop Python code that provides the optimal solution to the problem you are studying. You must develop test cases of varying sizes and illustrate how the runtime (wall clock) of your program varies for different size input. Your test cases must include at least one instance that causes your program to run for more than 20 minutes.
Make sure you cite in the comments of your program ALL external sources used in the development of your code.
Submission
Create a folder named exact_solution with the GIT repository. Place your code in this folder including a README.txt file that provides instructions and exact commands to run your code. Within the exact_solution folder, create a test_cases subfolder that contains all test cases. Create a shell script named run_test_cases.sh that executes your program on all test cases. Clearly label which test case causes your program to run for more than 20 minutes.
An example set of shell scripts
for performing this work is provided
(with a credit to Martin Nester for developing these very nice scripts).
Part B - Problem Presentation
Perform a 5 to 6 minute presentation on your group’s work for Part A. This presentation must include:
- a description of the problem being solved. Show the difference between the decision and optimization version of the problem
- example inputs and outputs while describing the problem
- why the problem is important (some applications that use this problem)
- an explanation of the certifier process being polynomial
- a reduction from a known/accepted NP-hard problem (with examples graphically shown)
- an explanation of the coded solution your group has created.
- an analytical (big \(\mathcal{O}\)) runtime analysis of your coded solution and showcase the code that is the dominant term (the code that drives the highest order term).
- a wallclock (empirical) runtime analysis of your code on your test cases. This analysis must show input size on the X axis and runtime on the Y axis.
Submission
Create a folder named presentations with the GIT repository. This presentation should use slides in either PowerPoint, Keynote, or Google slides (other slides formats can be requested, but are subject to instructor approval). The slides (or a file with a link to the google slides) should be placed in the git repository. Name this presentation problem_presentation with whatever file extension is appropriate.
Part C - Approximation Solution Code
Develop Python code that provides a reasonable approximation of the solution. Your approximation must run in polynomial time and process large problems (n > 1000) quickly. Some popular strategies are:
- Make greedy local choices
- utilize randomness coupled with an anytime algorithm
Your program needs to run without arguments (so that it works on gradescope), but you can incorporate optional command line arguments to augment the function of your program.
Submission
Create a folder named approx_solution with the GIT repository. Place your code in this folder and create a README file that provides instructions on how to run your program. You must create a file named run_examples.sh that successfully runs your program on several test cases you created. One of these test cases must be one where your approximation solution does not achieve the optimal answer. Document notes on your solution in a file named approximation_notes.pdf and at a minimum these notes must addresses the items listed in the rubric below.
Part D - Approximation Presentation
Perform a 5 to 6 minute presentation on your group’s work for Part C. To manage all of the presentations, you will be asked to stop presenting promptly at 6 minutes, so, it is a good idea to practice before class to make sure that your timing works. This presentation must include:
- pseudocode for the approximation and a discussion of the strategy used (greedy, random, etc.)
- analytical run time analysis (the worse case \(\mathcal{O}\) time) of your approximation algorithm
- lower bound analysis of the problem and the approximation
- plots that illustrate the run-time (wall clock) performance of your exact solution versus the approximation solution on your test cases
- plots that compare the result/solution of your exact solution versus the approximation on your test cases.
For the data that is used to create run times on your test cases, you should provide these in a script named compute_approx_wallclock.sh. Running this script should run your test cases that you used to gather your data for the plots. Recall that UNIX has a time command that may proof useful in collecting this information.
Submission
Within the folder presentation add a presentation/slides file constructed in PowerPoint, Keynote, or Google slides (other slides formats can be requested, but are subject to instructor approval). The slides (or a file with a link to the google slides) should be placed in the git repository. Name this presentation approximation_presentation with whatever file extension is appropriate.
Rubrics
Common rubric (13 points)
Item | Points |
---|---|
Team Formation | 3 |
Student Participation during in-class work days | 6 |
Peer review submission | 4 |
Exact solution rubrics (18 points).
Item | Points |
---|---|
Gradescope tests | 8 |
Small test cases are documented and explained to show correctness | 4 |
Large test case (more than 20 mins to run) is discussed | 3 |
run_test_cases.sh script works and executes all test cases | 3 |
Exact solution presentation rubrics (18 or 19 points).
Item | Points |
---|---|
Problem Description (what, why, applicaitons). Illustrate | 8 |
Show proof of NP-Completeness (decision, NP certificate, and reduction). Include a small sample of the reduction | 4 |
Longest Path Only (show why taking negative weights on all edges and using Bellman-Ford does not work) | 1 |
Sketch Exact Solution | 3 |
Discuss test case generation | 3 |
Approximation code (18 points).
Discussion points should be in approximation_notes.pdf within the approx_solution folder in the github repo.
Item | Points |
---|---|
Explain strategy (greedy, anytime, etc.). | 3 |
Discuss the analytical runtime analysis of your approximation. Recall it must run in polynomial time | 6 |
Program performs “reasonably well” on test cases on Gradescope | 6 |
run_test_cases.sh showscases at least 3 test cases (1 with nonoptimal results) | 3 |
Approximation Presentation (18 points).
Item | Points |
---|---|
Explain strategy (greedy, anytime, etc.) and how the code works | 3 |
Discuss problem instance where approximation does not achieve the optimal value. Use illustrations. | 3 |
Provide wallclock runtime results comparing the exact and approx solutions on one plot. | 6 |
Discuss a lower bound for your solution and show the delta to this lower bound (from the value you computed) for large problem instances you created for run_examples.sh | 6 |
1.1 - NP Complete Project Problem Specification
This document outlines the input and output specifications for each NP-complete problem. For each problem, sample input and output are provided.
Weighted Graph Problems (TSP, Longest Path)
For traveling sales person, the names of the two programs are:
- cs412_tsp_exact.py
- cs412_tsp_approx.py
For the longest path problem, the names of the two programs are:
- cs412_longestpath_exact.py
- cs412_longestpath_approx.py
The input is a weighted graph specified by a line containing the number of vertices n and the number of edges m followed by m lines containing the edges given in u v w format, representing an edge between u and v of weight w. TSP graphs are undirected and edges will be listed only once and the graph will be a complete graph. LP graphs are undirected and edges will be listed once (but it is not necessarily a complete graph).
The output contains two lines: the length of the path on one line (as an integer) followed by a list of vertices for the path/cycle on the second line.
Example TSP input:
3 3
a b 3.0
b c 4.2
a c 5.4
Example TSP output (you can round the answer to the nearest .0000)
12.6000
a b c a
Example LP input:
3 3
a b 3
b c 4
a c 5
Example LP output:
9
a c b
Max Clique, Vertex Cover, Min Graph Coloring
For max clique, the program file names are:
- cs412_maxclique_exact.py
- cs412_maxclique_approx.py
For min graph coloring, the program file names are:
- cs412_mingraphcoloring_exact.py
- cs412_mingraphcoloring_approx.py
For the minimum vertex cover problem, the program file names are:
- cs412_minvertexcover_exact.py
- cs412_minvertexcover_approx.py
For Max Clique and Vertex Cover, your input is an undirected graph. The first line will list the number of edges m. The remaining m lines are a list of vertice pairs
- u v* where an edge exists between u and v. The vertices are inferred from the edge list.
Sample Max Clique and Vertex Cover input:
2
a b
b c
This is a graph with three vertices, a, b, and c and two edges {a, b} and {b, c}.
For max clique and vertex cover, the output should be the structure (clique or vertex cover) specified by the vertices on a single line, space separated.
Sample Max Clique output:
a b
Sample min Vertex Cover output:
b
For min color, you should specify the number of colors on the first line, followed by the vertex color assignments in the format vertex name color. Your “color name” data should just be integers 0 through k-1 (where k is the min number of colors).
Sample Min Color Output
2
a 0
b 1
c 0
Max 3SAT
The program file names for max 3SAT are:
- cs412_max3sat_exact.py
- cs412_max3sat_approx.py
The first line of input specifies n the number of variables and m the number of clauses. The n variables are numbered 1 through n. The clauses are encoded as three indices with a minus sign (-) denoting negation (this is why they are numbered from 1 to n, since -0 is trouble).
So the clause \( (x_1 \lor \overline{x_2} \lor x_3) \land (\overline{x_1} \lor x_2 \lor \overline{x_3})\)
Would be encoded as:
3 2
1 -2 3
-1 2 -3
The first line of the output is the number of clauses satisfied. This is followed by n lines where each line details the variable’s assignment of true or false and is written as the variable number followed by a T or F.
Sample Output:
2
1 T
2 T
3 F
This output corresponds to the claim that if we set \(x_1\) to true, \(x_2\) to true, and \(x_3\) to false, we satisfy two clauses.
Min Set Cover
The program file names are:
- cs412_minsetcover_exact.py
- cs412_minsetcover_approx.py
The first line of the input will be the number of elements n followed by the number of subsets m. Your universe / elements are the numbers \(1, \ldots, n\) (count from 1). The next m lines will be each subset.
Sample Input
5 3
1 2 3
2 4
3 4 5
The input above is for the instance of the problem U={1, 2, 3, 4, 5}, S={{1, 2, 3}, {2, 4}, {3, 4, 5}}.
The output should be the indices of the subsets (indexed from 0) that are used to cover the universe (with a minimum number of subsets):
Sample Output
0 2
Meaning that the smallest cover is {1, 2, 3}, {3, 4, 5}.