Skip to content

Scripts for Project Database

Stock exchange data concept
Image source: freepik.com

Overview

This in-class assignment has three main goals:

  • Create the database you designed in GP3.
  • Put real and/or fake data into every table.
  • Write queries that showcase your project.

After this assignment, you will start building a web application that uses the database. The data and queries that you develop this week will be the focus of the web application.

Example

A model solution for this assignment is available in the database folder of the profs repository.

Please notice, in particular:

  • The schema.dbml and schema.pdf files summarize the database design.
  • The data subfolder contains CSV files of real data gathered by hand (see data/README.md).
  • The pgsql subfolder contains SQL and Python scripts for creating the database.

The provided build.sh script automates the entire process of creating the database, importing real data, generating fake data, creating views (queries), etc. This script can be run at any time to completely rebuild the database from scratch.

Instructions

Step 1: Create the Database

  1. Copy the pgsql folder from the profs repository into your own repository. These files are provided as a starting point, and you will end up replacing them with your own work.

  2. From dbdiagram.io, export your database design to PostgreSQL as an SQL script:
    Screenshot of dbdiagram.io menu

  3. Cut and paste the contents of the SQL script into three files (see provided examples):

    • create.sql should contain the CREATE TABLE statements.
    • comment.sql should contain the COMMENT statements.
    • alter.sql should contain the FOREIGN KEY statements.
  4. Replace the contents of drop.sql with your project's table names. Think carefully about the order you need to DROP the tables. If table A references table B, then table A must be dropped before Table B.

    Tip

    Now would be a good time to commit and push your files. Don't wait until you finish everything to make a commit.

  5. The next step is to get build.sh working on your computer:

    • Edit the script to use to your group's database by finding and replacing profs with your group's name.

    • Comment out lines 17–18, 20–21, and 26–27. This will prevent load.sql, generate.py, and views.sql from running. (You will write those scripts later on.)

  6. Read the comment at the top of build.sh:

    • Create a pgpass file with your group's password (not just your student number).
    • Read the documentation, and ask the instructor if you need help with this step.
  7. Run the build.sh script from the command line (Ex: ./build.sh). The script will likely not succeed on the first try, or even the tenth try! Read the error messages, and make changes to your code as needed.

    • You might have to make the file executable first (Ex: chmod +x build.sh).
    • Make sure you have an SSH tunnel running; the scripts connect to localhost.
  8. Make sure that every team member is able to run build.sh on their computer. If someone’s environment is not set up, they will be unable to fully contribute to the project.

    Tip

    Use all lowercase letters for table and column names (Ex: event_year, not eventYear). If you use any uppercase letters, you will have to wrap names in double quotes, which is pain.

    Important

    Whenever you change the database design, be sure to update schema.dbml and schema.png. You will likely need to change the design half a dozen times during this assignment. For example, you might need to change column names or data types.

Step 2: Load Real/Fake Data

  1. Create a data folder in your repository under your database folder (as shown in the profs repository). Create a README.md file in the data folder for taking notes about data sources. See the the example README.md in the profs repository.

  2. As a team, identify sources of real data your project can use. Download the data and store relevant files in your data/ folder. You may create CSV files by hand, using a spreadsheet app. Or you can write Python programs to extract data you need. Store any Python programs you write in your data/ folder. Keep notes in the README.md file so you can redo these steps, if needed.

    Large Files

    Do not store large files (Ex: > 10 MB) in your repository. If you are using a large data set, store the link in your README.md file. Also add the name of any large files to your .gitignore file. That way, another team member won't accidentally commit the large files. (You can download and store large files in your project folder; just don't commit them.)

  3. Edit load.sql to import your CSV files and/or INSERT other data into your database. Uncomment lines 17–18 of build.sh so that the load.sql script will run. Then run build.sh to make sure everything works.

    Tip

    Discuss as a team how you will divide up the labor for this step of the assignment. For example, two team members might focus on real data, and two might focus on fake data. Coordinate changes to the project files so that you don't step on each other's toes. Make incremental commits to the repository as you work.

  4. Rewrite the provided generate.py to create fake data for any table that does not have real data. Uncomment lines 20–21 of build.sh so that the generate.py script will run. Then run build.sh to make sure everything works.

Step 3: Write Queries (Views)

  1. Each team member should write at least one substantial query for your project. See the provided views.sql file for examples; your queries should be at least this complex.

  2. Replace the provided views.sql with queries for your project. For each view, write a corresponding DROP VIEW statement in your drop.sql script. Uncomment lines 26–27 of build.sh so that the views.sql script will run. Then run build.sh to make sure everything works.

  3. The queries.sql file from GP3 will likely be outdated now. Move this file and any other unneeded files to an archive folder in the root of your repository. See the profs repository for an example.

Submission

Your scripts will be reviewed by the instructor. Make sure your scripts are complete and run without any errors. In the end, you should be able to SELECT * from each of your views and see results.