CS 374 GP2: Schema Design
James Madison University, Spring 2022

Draft submission due Feb 24th in class.
Each group commits a schema directory on GitHub.

Instructions

The main goals of this assignment are twofold: create the initial design of your project database schema, and import as much data into your group database as possible.

  1. Determine what data will be necessary for your application. You should brainstorm (as a group) what tables and columns you will need before moving on to Step 2.

  2. Write CREATE TABLE statements for your database. Pick suitable data types for each attribute; see the PostgreSQL docs for a list of available options. Determine which attributes will be NOT NULL. For each table, write a brief one-sentence description in a COMMENT statement.

  3. Develop SQL queries to \copy the data from CSV files and/or other sources. After debugging your queries, paste them into a shell script to automate importing the data on the server (see example below).

  4. Identify PRIMARY KEY and FOREIGN KEY constraints for each relation. Write ALTER TABLE statements to enforce these constraints after you import the data. Determine the "drop order" of your tables based on foreign keys, and write DROP TABLE statements for future use (i.e., when making changes to your design).
  5. Think about which columns you will need to index (other than primary key columns, which are indexed automatically). Write CREATE INDEX statements, and run them after creating constraints.

  6. Write an SQL script to count the number of rows in each of your tables. After importing the data in the previous step, check to make sure the counts are what you expect them to be. (If a table has too many or two few rows, then there was a problem with the import.)

  7. Create a relational database diagram using dbdiagram.io that summarizes your tables and any constraints. Maintain the source code of your diagram in your repository, along with a pdf export. Whenever you change your design later in the semester, you must update your diagram files (dbml and pdf).

  8. Finally, write a README.md file that documents the steps needed to re-create your database from scratch. It should be useful both during the semester and after the course ends. (Note this README.md file is separate from the main one at the top level of the repository.)

Beginning with this assignment, we will use GitHub to manage and submit your group project files. Remember to commit and push your changes as you work on the project.

Each member of your group must make at least one commit, and all commits should have a brief comment. When I browse your submission history, it should show evidence of incremantal work (i.e., not just one big commit at the end by a single group member).

Getting Started

The instructions above are listed in the order that your scripts will need to run. However, you might want to start with the database diagram (Step 7). Notice that dbdiagram.io can generate some of the SQL code for you!

The following files are provided as a template for your solution. Consider assigning a different file to each group member. (You don't need to use all of these files—in particular, the csv file is only an example.) Each person should git add their file(s), git commit -m "write a comment here", and git push. The rest of the group should be able to see/edit changes after running git pull.

Step File Contents
1 college.csv Example optional NCES data set
2 create.sql DROP/CREATE table statements
3 copy.sh Script to COPY and \copy data
4 alter.sql Primary and foreign key constraints
5 index.sql Create indexes to speed up queries
6 stats.sql Analyze the tables and count the rows
7 diagram.dbml Diagram source (and diagram.pdf)
8 README.md Steps for creating your database

Do not store large files in your repository. Instead, provide a link to where you downloaded them in your README.md file. It's okay to store smaller (i.e., less than 5MB) data sets such as Excel spreadsheets, PDF files, and CSV data edited by hand.