CS 474 GP2: Schema Design
James Madison University, Spring 2018 Semester

Electronic submission due Feb 16th at 11:59 PM.
Each group creates a schema directory on GitHub.

Instructions

The main goals of this assignment are twofold: create the initial design of your project database schema, and import as much data into your group database as possible.

  1. Determine what data will be necessary for your application. You should brainstorm as a group what relations and attributes you will need before moving on to Step 2.

  2. Write CREATE TABLE statements for group database. Pick suitable data types for each attribute; see the PostgreSQL docs for a list of available options. Determine which attributes will be NOT NULL. For each table, write a brief one-sentence description in a COMMENT statement.

  3. Develop SQL queries to \copy the data from CSV or other files. After debugging your queries, paste them into a shell script to perform the actual import on the database server (see example below).

  4. Identify PRIMARY KEY and FOREIGN KEY constraints for each relation. Write ALTER TABLE statements to enforce these constraints after you import the data. Determine the "drop order" of your tables based on foreign key dependencies, and write DROP TABLE statements for future use (i.e., when making changes to your design).
  5. Write an SQL script to count the number of rows in each of your tables. After importing the data in the previous step, check to make sure the counts are what you expect them to be. (If a table has too many or two few rows, then there was a problem with the import.)

  6. Create a database diagram (e.g., using Dia or other software) that summarizes your tables and any constraints. Write the number of rows in the "comment" property of each table, and be sure to check the "comment visible" option. Your diagram should fit on a single page.

  7. Finally, write a README.md file that documents the steps needed to re-create your database from scratch. It will be useful both during the semester and after the course ends.

Beginning with this assignment, we will use GitHub to manage and submit your group project files. Remember to commit and push your changes as you work on the project. Each member of your team must make at least one commit, and all commits should have a brief comment. When I browse your submission history, it should show evidence of incremantal work (i.e., not just one big commit at the end by a single team member).

Getting Started

The following files are provided as a template for your solution. You don't need to use all of these files -- just modify the ones you actually plan to use in your project. One team member should save them into your schema directory, run the command git add on each file, then git commit -m "added GP2 template files", and finally git push. The rest of the team should be able to see/edit the files after running git pull.

Step File Contents
1 college.csv Example optional NCES data set
2 create.sql DROP/CREATE table statements
3 copy.sh Script to COPY and \copy data
4 alter.sql Add primary and foreign key constraints
5 stats.sql Analyze the tables and count the rows
6 schema.dia Database diagram with row counts
7 README.md Steps for creating your database

Do not store large files in your repository. Instead, provide a link to where you downloaded them in your README.md file. But it's okay to store smaller (less than 2MB) data sets such as Excel spreadsheets, PDF files, and CSV data edited by hand.