GP3: Data and Queries
Due: Wednesday, Oct 30th
Image source:
freepik.com
Overview¶
This assignment has three main goals:
- Create the database you designed in GP2.
- Put real and/or fake data into every table.
- Write queries that showcase your project.
After this assignment, you will start building a web application that uses the database. The data and queries that you develop this week will be the focus of the web application.
Example¶
A model solution for this assignment is available in the database
folder of the
profs repository.
Please notice, in particular:
- The
schema.dbml
andschema.png
files have been revised significantly (see commit history). - The
data
subfolder contains CSV files of real data gathered by hand (see data/README.md). - The
mysql
andpgsql
subfolders contain SQL and Python scripts for creating the database.- Note: You don't need both of these subfolders in your project; they are given as examples.
- Outdated files from GP2 that are no longer needed have been moved into the
archive
folder.
The provided build.sh
script automates the entire process of creating the database, importing real data, generating fake data, creating views (queries), etc.
This script can be run at any time to completely rebuild your database.
Instructions¶
Part 1: Create the Database¶
-
Decide whether your team will use MySQL or PostgreSQL for your project. Either system is fine, and each has pros and cons. You can change your mind later on if you are unhappy with your decision.
-
Copy the
mysql
orpgsql
folder from theprofs
repository into your own repository. These files are provided as a starting point, and you will end up replacing them with your own work. -
From dbdiagram.io, export your database design to either MySQL or PostgreSQL as an SQL script:
-
Cut and paste the contents of the SQL script into three files (see provided examples):
create.sql
should contain theCREATE TABLE
statements.comment.sql
should contain theCOMMENT
statements.alter.sql
should contain theFOREIGN KEY
statements.
-
Replace the contents of
drop.sql
with your project's table names. Think carefully about the order you need toDROP
the tables. If table A references table B, then table A must be dropped before Table B.Tip
Now would be a good time to commit and push your files. Don't wait until you finish everything to make a commit.
-
The next step is to get
build.sh
working on your computer:-
Edit the script to use to your group's database by finding and replacing
profs
with your group's name. -
Comment out lines 17–18, 20–21, and 26–27. This will prevent
load.sql
,generate.py
, andviews.sql
from running. (You will write those scripts later on.)
-
-
Read the comment at the top of
build.sh
:- If you are using MySQL, run
mysql_config_editor
to create the login path. - If you are using PostgreSQL, create a
pgpass
file with your group's password. - Read the documentation, and ask the instructor if you need help with this step.
- If you are using MySQL, run
-
Run the
build.sh
script from the command line (Ex:./build.sh
). The script will likely not succeed on the first try, or even the tenth try! Read the error messages, and make changes to your code as needed.- You might have to make the file executable first (Ex:
chmod +x build.sh
). - Make sure you have an SSH tunnel running; the scripts connect to
localhost
.
Important
Whenever you change the database design, be sure to update
schema.dbml
andschema.png
. You will likely need to change the design half a dozen times during this assignment. For example, you might need to change column names or data types. - You might have to make the file executable first (Ex:
-
Make sure that every team member is able to run
build.sh
on their computer. If someone’s environment is not set up, they will be unable to fully contribute to the project.
Part 2: Load Real/Fake Data¶
-
Create a
data
folder in your repository under yourdatabase
folder (as shown in the profs repository). Create aREADME.md
file in thedata
folder for taking notes about data sources. See the the example README.md in the profs repository. -
As a team, identify sources of real data your project can use. Download the data and store relevant files in your
data/
folder. You may create CSV files by hand, using a spreadsheet app. Or you can write Python programs to extract data you need. Store any Python programs you write in yourdata/
folder. Keep notes in theREADME.md
file so you can redo these steps, if needed.Large Files
Do not store large files (Ex: > 10 MB) in your repository. If you are using a large data set, store the link in your
README.md
file. Also add the name of any large files to your.gitignore
file. That way, another team member won't accidentally commit the large files. (You can download and store large files in your project folder; just don't commit them.) -
Edit
load.sql
to import your CSV files and/orINSERT
other data into your database. Uncomment lines 17–18 ofbuild.sh
so that theload.sql
script will run. Then runbuild.sh
to make sure everything works.Tip
Discuss as a team how you will divide up the labor for this part of the assignment. For example, two team members might focus on real data, and two might focus on fake data. Coordinate changes to the project files so that you don't step on each other's toes. Make incremental commits to the repository as you work.
-
Rewrite the provided
generate.py
to create fake data for any table that does not have real data. Uncomment lines 20–21 ofbuild.sh
so that thegenerate.py
script will run. Then runbuild.sh
to make sure everything works.
Part 3: Write Queries (Views)¶
-
Each team member should write at least one substantial query for your project. See the provided
views.sql
file for examples; your queries should be at least this complex. -
Replace the provided
views.sql
with queries for your project. For each view, write a correspondingDROP VIEW
statement in yourdrop.sql
script. Uncomment lines 26–27 ofbuild.sh
so that theviews.sql
script will run. Then runbuild.sh
to make sure everything works. -
The
queries.sql
file from GP2 will likely be outdated now. Move this file and any other unneeded files to anarchive
folder in the root of your repository. See the profs repository for an example.
Submission¶
Your scripts will be reviewed by the instructor and another team during class.
Make sure your scripts are complete and run without any errors.
In the end, you should be able to SELECT *
from each of your views and see results.
After you complete this assignment, you will evaluate your team's performance. Part of the performance review will involve looking at the commit history. We expect each team member to make multiple commits during the week.