James Madison University, Spring 2022
Draft submission due Feb 24th in class.
Each group commits a schema
directory on GitHub.
Instructions
The main goals of this assignment are twofold: create the initial design of your project database schema, and import as much data into your group database as possible.
Determine what data will be necessary for your application. You should brainstorm (as a group) what tables and columns you will need before moving on to Step 2.
Write
CREATE TABLE
statements for your database. Pick suitable data types for each attribute; see the PostgreSQL docs for a list of available options. Determine which attributes will beNOT NULL
. For each table, write a brief one-sentence description in aCOMMENT
statement.Develop SQL queries to
\copy
the data from CSV files and/or other sources. After debugging your queries, paste them into a shell script to automate importing the data on the server (see example below).- Identify
PRIMARY KEY
andFOREIGN KEY
constraints for each relation. WriteALTER TABLE
statements to enforce these constraints after you import the data. Determine the "drop order" of your tables based on foreign keys, and writeDROP TABLE
statements for future use (i.e., when making changes to your design). Think about which columns you will need to index (other than primary key columns, which are indexed automatically). Write
CREATE INDEX
statements, and run them after creating constraints.Write an SQL script to count the number of rows in each of your tables. After importing the data in the previous step, check to make sure the counts are what you expect them to be. (If a table has too many or two few rows, then there was a problem with the import.)
Create a relational database diagram using dbdiagram.io that summarizes your tables and any constraints. Maintain the source code of your diagram in your repository, along with a pdf export. Whenever you change your design later in the semester, you must update your diagram files (dbml and pdf).
Finally, write a README.md file that documents the steps needed to re-create your database from scratch. It should be useful both during the semester and after the course ends. (Note this README.md file is separate from the main one at the top level of the repository.)
Beginning with this assignment, we will use GitHub to manage and submit your group project files. Remember to commit and push your changes as you work on the project.
Each member of your group must make at least one commit, and all commits should have a brief comment. When I browse your submission history, it should show evidence of incremantal work (i.e., not just one big commit at the end by a single group member).
Getting Started
The instructions above are listed in the order that your scripts will need to run. However, you might want to start with the database diagram (Step 7). Notice that dbdiagram.io can generate some of the SQL code for you!
The following files are provided as a template for your solution. Consider assigning a different file to each group member. (You don't need to use all of these files—in particular, the csv file is only an example.) Each person should git add
their file(s), git commit -m "write a comment here"
, and git push
. The rest of the group should be able to see/edit changes after running git pull
.
Step | File | Contents |
---|---|---|
1 | college.csv | Example optional NCES data set |
2 | create.sql | DROP/CREATE table statements |
3 | copy.sh | Script to COPY and \copy data |
4 | alter.sql | Primary and foreign key constraints |
5 | index.sql | Create indexes to speed up queries |
6 | stats.sql | Analyze the tables and count the rows |
7 | diagram.dbml | Diagram source (and diagram.pdf) |
8 | README.md | Steps for creating your database |
Do not store large files in your repository. Instead, provide a link to where you downloaded them in your README.md file. It's okay to store smaller (i.e., less than 5MB) data sets such as Excel spreadsheets, PDF files, and CSV data edited by hand.