Skip to content

GP3: Data and Queries

Due: Wednesday, Oct 30th

Stock exchange data concept
Image source: freepik.com

Overview

This assignment has three main goals:

  • Create the database you designed in GP2.
  • Put real and/or fake data into every table.
  • Write queries that showcase your project.

After this assignment, you will start building a web application that uses the database. The data and queries that you develop this week will be the focus of the web application.

Example

A model solution for this assignment is available in the database folder of the profs repository.

Please notice, in particular:

  • The schema.dbml and schema.png files have been revised significantly (see commit history).
  • The data subfolder contains CSV files of real data gathered by hand (see data/README.md).
  • The mysql and pgsql subfolders contain SQL and Python scripts for creating the database.
    • Note: You don't need both of these subfolders in your project; they are given as examples.
  • Outdated files from GP2 that are no longer needed have been moved into the archive folder.

The provided build.sh script automates the entire process of creating the database, importing real data, generating fake data, creating views (queries), etc. This script can be run at any time to completely rebuild your database.

Instructions

Part 1: Create the Database

  1. Decide whether your team will use MySQL or PostgreSQL for your project. Either system is fine, and each has pros and cons. You can change your mind later on if you are unhappy with your decision.

  2. Copy the mysql or pgsql folder from the profs repository into your own repository. These files are provided as a starting point, and you will end up replacing them with your own work.

  3. From dbdiagram.io, export your database design to either MySQL or PostgreSQL as an SQL script:
    Screenshot of dbdiagram.io menu

  4. Cut and paste the contents of the SQL script into three files (see provided examples):

    • create.sql should contain the CREATE TABLE statements.
    • comment.sql should contain the COMMENT statements.
    • alter.sql should contain the FOREIGN KEY statements.
  5. Replace the contents of drop.sql with your project's table names. Think carefully about the order you need to DROP the tables. If table A references table B, then table A must be dropped before Table B.

    Tip

    Now would be a good time to commit and push your files. Don't wait until you finish everything to make a commit.

  6. The next step is to get build.sh working on your computer:

    • Edit the script to use to your group's database by finding and replacing profs with your group's name.

    • Comment out lines 17–18, 20–21, and 26–27. This will prevent load.sql, generate.py, and views.sql from running. (You will write those scripts later on.)

  7. Read the comment at the top of build.sh:

    • If you are using MySQL, run mysql_config_editor to create the login path.
    • If you are using PostgreSQL, create a pgpass file with your group's password.
    • Read the documentation, and ask the instructor if you need help with this step.
  8. Run the build.sh script from the command line (Ex: ./build.sh). The script will likely not succeed on the first try, or even the tenth try! Read the error messages, and make changes to your code as needed.

    • You might have to make the file executable first (Ex: chmod +x build.sh).
    • Make sure you have an SSH tunnel running; the scripts connect to localhost.

    Important

    Whenever you change the database design, be sure to update schema.dbml and schema.png. You will likely need to change the design half a dozen times during this assignment. For example, you might need to change column names or data types.

  9. Make sure that every team member is able to run build.sh on their computer. If someone’s environment is not set up, they will be unable to fully contribute to the project.

Part 2: Load Real/Fake Data

  1. Create a data folder in your repository under your database folder (as shown in the profs repository). Create a README.md file in the data folder for taking notes about data sources. See the the example README.md in the profs repository.

  2. As a team, identify sources of real data your project can use. Download the data and store relevant files in your data/ folder. You may create CSV files by hand, using a spreadsheet app. Or you can write Python programs to extract data you need. Store any Python programs you write in your data/ folder. Keep notes in the README.md file so you can redo these steps, if needed.

    Large Files

    Do not store large files (Ex: > 10 MB) in your repository. If you are using a large data set, store the link in your README.md file. Also add the name of any large files to your .gitignore file. That way, another team member won't accidentally commit the large files. (You can download and store large files in your project folder; just don't commit them.)

  3. Edit load.sql to import your CSV files and/or INSERT other data into your database. Uncomment lines 17–18 of build.sh so that the load.sql script will run. Then run build.sh to make sure everything works.

    Tip

    Discuss as a team how you will divide up the labor for this part of the assignment. For example, two team members might focus on real data, and two might focus on fake data. Coordinate changes to the project files so that you don't step on each other's toes. Make incremental commits to the repository as you work.

  4. Rewrite the provided generate.py to create fake data for any table that does not have real data. Uncomment lines 20–21 of build.sh so that the generate.py script will run. Then run build.sh to make sure everything works.

Part 3: Write Queries (Views)

  1. Each team member should write at least one substantial query for your project. See the provided views.sql file for examples; your queries should be at least this complex.

  2. Replace the provided views.sql with queries for your project. For each view, write a corresponding DROP VIEW statement in your drop.sql script. Uncomment lines 26–27 of build.sh so that the views.sql script will run. Then run build.sh to make sure everything works.

  3. The queries.sql file from GP2 will likely be outdated now. Move this file and any other unneeded files to an archive folder in the root of your repository. See the profs repository for an example.

Submission

Your scripts will be reviewed by the instructor and another team during class. Make sure your scripts are complete and run without any errors. In the end, you should be able to SELECT * from each of your views and see results.

After you complete this assignment, you will evaluate your team's performance. Part of the performance review will involve looking at the commit history. We expect each team member to make multiple commits during the week.