Scripts for Project Database
Image source:
freepik.com
Overview¶
This in-class assignment has three main goals:
- Create the database you designed in GP3.
- Put real and/or fake data into every table.
- Write queries that showcase your project.
After this assignment, you will start building a web application that uses the database. The data and queries that you develop this week will be the focus of the web application.
Example¶
A model solution for this assignment is available in the database folder of the
profs repository.
Please notice, in particular:
- The
schema.dbmlandschema.pdffiles summarize the database design. - The
datasubfolder contains CSV files of real data gathered by hand (see data/README.md). - The
pgsqlsubfolder contains SQL and Python scripts for creating the database.
The provided build.sh script automates the entire process of creating the database, importing real data, generating fake data, creating views (queries), etc.
This script can be run at any time to completely rebuild the database from scratch.
Instructions¶
Step 1: Create the Database¶
-
Copy the
pgsqlfolder from theprofsrepository into your own repository. These files are provided as a starting point, and you will end up replacing them with your own work. -
From dbdiagram.io, export your database design to PostgreSQL as an SQL script:

-
Cut and paste the contents of the SQL script into three files (see provided examples):
create.sqlshould contain theCREATE TABLEstatements.comment.sqlshould contain theCOMMENTstatements.alter.sqlshould contain theFOREIGN KEYstatements.
-
Replace the contents of
drop.sqlwith your project's table names. Think carefully about the order you need toDROPthe tables. If table A references table B, then table A must be dropped before Table B.Tip
Now would be a good time to commit and push your files. Don't wait until you finish everything to make a commit.
-
The next step is to get
build.shworking on your computer:-
Edit the script to use to your group's database by finding and replacing
profswith your group's name. -
Comment out lines 17–18, 20–21, and 26–27. This will prevent
load.sql,generate.py, andviews.sqlfrom running. (You will write those scripts later on.)
-
-
Read the comment at the top of
build.sh:- Create a
pgpassfile with your group's password (not just your student number). - Read the documentation, and ask the instructor if you need help with this step.
- Create a
-
Run the
build.shscript from the command line (Ex:./build.sh). The script will likely not succeed on the first try, or even the tenth try! Read the error messages, and make changes to your code as needed.- You might have to make the file executable first (Ex:
chmod +x build.sh). - Make sure you have an SSH tunnel running; the scripts connect to
localhost.
- You might have to make the file executable first (Ex:
-
Make sure that every team member is able to run
build.shon their computer. If someone’s environment is not set up, they will be unable to fully contribute to the project.Tip
Use all lowercase letters for table and column names (Ex:
event_year, noteventYear). If you use any uppercase letters, you will have to wrap names in double quotes, which is pain.Important
Whenever you change the database design, be sure to update
schema.dbmlandschema.png. You will likely need to change the design half a dozen times during this assignment. For example, you might need to change column names or data types.
Step 2: Load Real/Fake Data¶
-
Create a
datafolder in your repository under yourdatabasefolder (as shown in the profs repository). Create aREADME.mdfile in thedatafolder for taking notes about data sources. See the the example README.md in the profs repository. -
As a team, identify sources of real data your project can use. Download the data and store relevant files in your
data/folder. You may create CSV files by hand, using a spreadsheet app. Or you can write Python programs to extract data you need. Store any Python programs you write in yourdata/folder. Keep notes in theREADME.mdfile so you can redo these steps, if needed.Large Files
Do not store large files (Ex: > 10 MB) in your repository. If you are using a large data set, store the link in your
README.mdfile. Also add the name of any large files to your.gitignorefile. That way, another team member won't accidentally commit the large files. (You can download and store large files in your project folder; just don't commit them.) -
Edit
load.sqlto import your CSV files and/orINSERTother data into your database. Uncomment lines 17–18 ofbuild.shso that theload.sqlscript will run. Then runbuild.shto make sure everything works.Tip
Discuss as a team how you will divide up the labor for this step of the assignment. For example, two team members might focus on real data, and two might focus on fake data. Coordinate changes to the project files so that you don't step on each other's toes. Make incremental commits to the repository as you work.
-
Rewrite the provided
generate.pyto create fake data for any table that does not have real data. Uncomment lines 20–21 ofbuild.shso that thegenerate.pyscript will run. Then runbuild.shto make sure everything works.
Step 3: Write Queries (Views)¶
-
Each team member should write at least one substantial query for your project. See the provided
views.sqlfile for examples; your queries should be at least this complex. -
Replace the provided
views.sqlwith queries for your project. For each view, write a correspondingDROP VIEWstatement in yourdrop.sqlscript. Uncomment lines 26–27 ofbuild.shso that theviews.sqlscript will run. Then runbuild.shto make sure everything works. -
The
queries.sqlfile from GP3 will likely be outdated now. Move this file and any other unneeded files to anarchivefolder in the root of your repository. See the profs repository for an example.
Submission¶
Your scripts will be reviewed by the instructor.
Make sure your scripts are complete and run without any errors.
In the end, you should be able to SELECT * from each of your views and see results.