This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Homework

1-Day In-Class Assignments

1 - NP Complete Project

Learning Objectives

  • implement a brute-force approach to solve a known NP-complete problem and analyze its runtime complexity
  • present/discuss an NP-completeness proof to a group of colleagues and justify that an approximation method is appropriate
  • create an approximation method for a known NP-complete problem and present its strengths and weaknesses.
  • implement a reduction to another NP complete problem and compare the performance
  • improve presentation skills (both speaking and slide creation) by giving and receiving peer feedback

Introduction

In this project you will explore a known NP-complete problem. The sections below detail each of the project components.
Individual assignments will be created for all submission pieces (or they will be submitted in Github and an assignment will be created in Canvas to represent your grade).

Problem and Group Selection

This is a group project that must have 3 members.
It may be necessary to work in a group of 2, and if so, the instructor will work out the details of what portions of the project can be reduced in scope. Your group will select and rank three NP-complete problems that you would like to work on for this project from the list below. You should do some research before selecting one of these problems. Ideally, we will get a nice distribution of projects across these problems.

  • Max 3-Sat
  • Max clique
  • Traveling Salesman Problem (TSP) with a complete graph
  • Longest Path
  • Min Vertex Cover
  • Min Graph Coloring

I have created example input and output files for each problem. After I receive and review all the group preferences, I will assign each group a single NP-complete problem to work on.

Role Selection and Tasks

Each of the tasks for the project are outlined below. One person in the group will be responsible for Parts A and B, another will perform Parts C and D and another will perform Parts E and F.

Part A - Exact Solution Code

Develop Python code that provides the optimal solution to the problem you are studying.
You must develop test cases of varying sizes and illustrate how the runtime (wall clock) of your program varies for different size input. Your test cases must include at least one instance that causes your program to run for more than 20 minutes.

Make sure you cite in the comments of your program ALL external sources used in the development of your code.

Submission

Create a folder named exact_solution with the GIT repository. Place your code in this folder including a README.txt file that provides instructions and the exact commands to run your code. Within the exact_solution folder, create a test_cases subfolder that contains all test cases. Create a shell script named run_test_cases.sh that executes your program on all test cases. Clearly label (with comments) which test case causes your program to run for more than 20 minutes.
An example set of shell scripts for performing this work is provided (thanks to Martin Nester for developing these very nice scripts).

Include in the write up:

  • why the problem is important (some applications that use this problem)

Part B - Problem Presentation

Perform a 5 minute presentation on your group’s work for Part A. This presentation must include:

  • a description of the problem being solved. Show the difference between the decision and optimization version of the problem. An example of inputs and outputs is a good idea. (1.5 mins)
  • an explanation of the certifier process being polynomial. Including pseudo-code here is a good idea but make sure its clear that the certificate can be verified in polynomial time.
  • a reduction from a known/accepted NP-hard problem (with examples graphically shown). It must be clear that the reduction takes place in a polynomial number of steps with respect to the problem’s input size.
  • an analytical (big \(\mathcal{O}\)) runtime analysis of your coded solution and showcase the code that is the dominant term (the code that drives the highest order term).
  • a wallclock (empirical) runtime analysis of your code on your test cases. This analysis must show input size on the X axis and runtime on the Y axis.

Submission

Create a folder named presentations with the GIT repository. This presentation should use slides in either PowerPoint, Keynote, or Google slides (other slides formats can be requested, but are subject to instructor approval). The slides (or a file with a link to the google slides) should be placed in the git repository. Name this presentation problem_presentation with whatever file extension is appropriate.

Part C - Develop and Approximation Solution and Code It

Develop Python code that provides a reasonable approximation of the solution. Your approximation must run in polynomial time and process large problems (n > 1000) quickly. Some popular strategies are:

  • Make greedy local choices, breaking ties randomly
  • utilize randomness coupled with an anytime algorithm

Your program must execute without arguments (so that it works on gradescope), but you can incorporate optional command line arguments to augment the function of your program (for example, output wall clock timings or override the runtime for an anytime approach).

Submission

Create a folder named approx_solution with the GIT repository. Place your code in this folder and create a README file that provides instructions on how to run your program. Within the approx_solution folder, create a test_cases subfolder that contains all test cases. You must create a file named run_test_cases.sh that successfully runs your program on several test cases you created. One of these test cases must be one where your approximation solution does not achieve the optimal answer, place this test case in run_test_cases.sh and run_nonopt_cases.sh. The run_nonopt_cases.sh file only needs one example. Document notes on your solution in a file named approximation_notes.pdf and at a minimum these notes must addresses the items listed in the rubric below.

Part D - Approximation Presentation

Perform a 5 minute presentation on your group’s work for Part C.

This presentation must include:

  • pseudocode for the approximation and a discussion of the strategy used (greedy, random, etc.)
  • analytical run time analysis (the worse case \(\mathcal{O}\) time) of your approximation algorithm
  • plots that illustrate the run-time (wall clock) performance of your exact solution versus the approximation solution on your test cases
  • plots that compare the result/solution of your exact solution versus the approximation on your test cases. This is where you show the value achieved by both and see how well your approximation did versus the optimal solution.

For the data that is used to create run times on your test cases, you should provide these in a script named compute_approx_wallclock.sh (this should reside in the test_cases folder). Running this script should run your test cases that you used to gather your data for the plots. Recall that UNIX has a time command that may proof useful in collecting this information.

Submission

Within the folder presentation add a presentation/slides file constructed in PowerPoint, Keynote, or Google slides (other slides formats can be requested, but are subject to instructor approval). The slides (or a file with a link to the google slides) should be placed in the git repository. Name this presentation approximation_presentation with whatever file extension is appropriate.

Part E - Augment the Study

For this portion of the project, you will need to select one of the optional components for each of the problems. These will be available in the next day, but this person’s role will be to help augment the approximation solution.

Part F - Present

This will be to present on your work done in Part E.

Rubrics

Common rubric (13 points)

Item Points
Team Formation 3
Student Participation during in-class work days 6
Peer review submission 4

Exact solution rubrics (18 points).

Item Points
Gradescope tests 8
Small test cases are documented and explained to show correctness 4
Large test case (more than 20 mins to run) is discussed 3
run_test_cases.sh script works and executes all test cases 3

Exact solution presentation rubrics (18 or 19 points).

Item Points
Problem Description (what, why, applicaitons). Illustrate 8
Show proof of NP-Completeness (decision, NP certificate, and reduction). Include a small sample of the reduction 4
Longest Path Only (show why taking negative weights on all edges and using Bellman-Ford does not work) 1
Sketch Exact Solution 3
Discuss test case generation 3

Approximation code (18 points).

Discussion points should be in approximation_notes.pdf within the approx_solution folder in the github repo.

Item Points
Explain strategy (greedy, anytime, etc.). 3
Discuss the analytical runtime analysis of your approximation. Recall it must run in polynomial time 6
Program performs “reasonably well” on test cases on Gradescope 6
run_test_cases.sh showscases at least 3 test cases (1 with nonoptimal results) 3

Approximation Presentation (18 points).

Item Points
Explain strategy (greedy, anytime, etc.) and how the code works 3
Discuss problem instance where approximation does not achieve the optimal value. Use illustrations. 3
Provide wallclock runtime results comparing the exact and approx solutions on one plot. 6
Discuss a lower bound for your solution and show the delta to this lower bound (from the value you computed) for large problem instances you created for run_examples.sh 6

1.1 - NP Complete Project Problem Specification

This document outlines the input and output specifications for each NP-complete problem. For each problem, sample input and output are provided.

Weighted Graph Problems (TSP, Longest Path)

For traveling sales person, the names of the two programs are:

  • cs412_tsp_exact.py
  • cs412_tsp_approx.py

For the longest path problem, the names of the two programs are:

  • cs412_longestpath_exact.py
  • cs412_longestpath_approx.py

The input is a weighted graph specified by a line containing the number of vertices n and the number of edges m followed by m lines containing the edges given in u v w format, representing an edge between u and v of weight w. TSP graphs are undirected and edges will be listed only once and the graph will be a complete graph. LP graphs are undirected and edges will be listed once (but it is not necessarily a complete graph).

The output contains two lines: the length of the path on one line (as an integer) followed by a list of vertices for the path/cycle on the second line.

Example TSP input:

3 3
a b 3.0
b c 4.2
a c 5.4

Example TSP output (you can round the answer to the nearest .0000)

12.6000
a b c a

Example LP input:

3 3
a b 3
b c 4
a c 5

Example LP output:

9
a c b

Max Clique, Vertex Cover, Min Graph Coloring

For max clique, the program file names are:

  • cs412_maxclique_exact.py
  • cs412_maxclique_approx.py

For min graph coloring, the program file names are:

  • cs412_mingraphcoloring_exact.py
  • cs412_mingraphcoloring_approx.py

For the minimum vertex cover problem, the program file names are:

  • cs412_minvertexcover_exact.py
  • cs412_minvertexcover_approx.py

For Max Clique and Vertex Cover, your input is an undirected graph. The first line will list the number of edges m. The remaining m lines are a list of vertice pairs

  • u v* where an edge exists between u and v. The vertices are inferred from the edge list.

Sample Max Clique and Vertex Cover input:

2
a b
b c

This is a graph with three vertices, a, b, and c and two edges {a, b} and {b, c}.

For max clique and vertex cover, the output should be the structure (clique or vertex cover) specified by the vertices on a single line, space separated.

Sample Max Clique output:

a b

Sample min Vertex Cover output:

b

For min color, you should specify the number of colors on the first line, followed by the vertex color assignments in the format vertex name color. Your “color name” data should just be integers 0 through k-1 (where k is the min number of colors).

Sample Min Color Output

2
a 0
b 1
c 0

Max 3SAT

The program file names for max 3SAT are:

  • cs412_max3sat_exact.py
  • cs412_max3sat_approx.py

The first line of input specifies n the number of variables and m the number of clauses. The n variables are numbered 1 through n. The clauses are encoded as three indices with a minus sign (-) denoting negation (this is why they are numbered from 1 to n, since -0 is trouble).

So the clause \( (x_1 \lor \overline{x_2} \lor x_3) \land (\overline{x_1} \lor x_2 \lor \overline{x_3})\)

Would be encoded as:

3 2
1 -2 3
-1 2 -3

The first line of the output is the number of clauses satisfied. This is followed by n lines where each line details the variable’s assignment of true or false and is written as the variable number followed by a T or F.

Sample Output:

2
1 T
2 T
3 F

This output corresponds to the claim that if we set \(x_1\) to true, \(x_2\) to true, and \(x_3\) to false, we satisfy two clauses.

Min Set Cover

The program file names are:

  • cs412_minsetcover_exact.py
  • cs412_minsetcover_approx.py

The first line of the input will be the number of elements n followed by the number of subsets m. Your universe / elements are the numbers \(1, \ldots, n\) (count from 1). The next m lines will be each subset.

Sample Input

5 3
1 2 3
2 4
3 4 5

The input above is for the instance of the problem U={1, 2, 3, 4, 5}, S={{1, 2, 3}, {2, 4}, {3, 4, 5}}.

The output should be the indices of the subsets (indexed from 0) that are used to cover the universe (with a minimum number of subsets):

Sample Output

0 2

Meaning that the smallest cover is {1, 2, 3}, {3, 4, 5}.

1.2 - Project Options

Project Options

MAX 3-SAT

Solve Problem using Independent Set Approximation

For this option, you must develop code that transforms the 3-SAT input into input for the independent set problem (in a polynomial number of steps). Many examples of how to do this are in publication, you just need to implement one of them and discuss how it works.

You can then use the instructor provided approximation algorithm for independent set and compare it to how your team’s approximation method (part D) performs.

Enumerate Clauses and Showcase Results (in development)

For all values of $$3 \ge n \ge 5$$, enumerate all possible input clauses and then show the ratio of unsatisfiable sets versus satisfiable sets.

Max Clique

TSP

Advanced Approximation Strategies

Augment the approximation strategy to:

  • Many approximations reach a local min and can’t improve from there. Augment the approximation code to detect this local min and restart the approximation. Track each approximation and show the variance encountered by your program as well as the quality of the best solution as a function of time. Plot these results and discuss.

Longest Path

Test Cases

Generate at least 1000 interesting test cases and perform a comprehensive comparison of how the optimal solution/code differs from the approximation codes results.

Min Vertex Cover

Perform a reduction to Independent Set

For this option, you must develop code that transforms the Min Vertex Cover input into input for the independent set problem (in a polynomial number of steps). Many examples of how to do this are in publication, you just need to implement one of them and discuss how it works.

You can then use the instructor provided approximation algorithm for independent set and compare it to how your team’s approximation method (part D) performs.