Skip to content

HW4: Models and Data

Due: Tuesday, Sep 21st

SQLAlchemy Faker

In the EER to Relational Mapping lab, you created a DBML model for the Conference Review System. The database diagram below is a further iteration of that design that features additional tables, column constraints, and custom data types. In this assignment, you will create this database on the PostgreSQL server and use SQLAlchemy and Faker to load the database with test data.

DB diagram for Conference Review

Objectives

  • Write SQL scripts for creating and dropping database tables/types.
  • Use sqlacodegen to generate Python model classes for a database.
  • Write a Python script that uses the model classes to generate data.

Part 1: Write Create Script

  1. Create a hw4 folder for this assignment – you will have multiple files in the end.
  2. Open a new diagram in dbdiagram.io and paste the contents of crs.dbml, the source code for the above diagram.
  3. Export the diagram to PostgreSQL and rename the resulting file as create.sql. Move the file to your hw4 folder.
  4. Run pgAdmin and open a Query Tool in the sec1 (12:45 with Dr. Rizvi) or sec2 (2:20 with Dr. Mayfield) database.
  5. Open the create.sql file in pgAdmin and run the code. There should not be any errors, unless you missed a step.
  6. Go to your schema in the sec1/sec2 database. Right click Tables and select Refresh. You should see 12 tables.

Part 2: Write Drop Script

  1. Open a terminal in your hw4 folder, and run this command. The output will be a list of the CREATE statements.
    grep CREATE create.sql
    
  2. Create a new file named drop.sql and write a DROP statement for each CREATE statement using this format:
    DROP TABLE IF EXISTS person;
    
  3. Open the drop.sql file in pgAdmin and try running the script. If the code runs without errors, good for you! 😄
    • You will likely get constraint errors if your DROP statements are in the same order as the original CREATE statements.
    • Reorder the drop statements so that tables with FOREIGN KEY constraints are dropped before the tables they reference.
    • Hint #1: Search for all ">" operators in the DBML source code to see which tables reference other tables.
    • Hint #2: The last line of your drop.sql script should begin with the words DROP TYPE.
  4. Make sure you are able to run your drop.sql and create.sql scripts, in that order, without getting errors.

Part 3: SQLAlchemy Models

  1. Determine the URL for your database:
    • Example postgresql+psycopg://user:password@data.cs.jmu.edu/sec1 (or sec2).
    • Replace user:password with your username and database password (student number).
    • If you connecting via SSH tunnel, use localhost instead of data.cs.jmu.edu.
  2. Run sqlacodegen on the URL and redirect the output to a file named models.py.
    • See the instructions on Slide 13 of the Introduction to SQLAlchemy lecture.
    • For simplicity on this assignment, do not modify the generated models.py file.

Note

Notice that most of the tables in models.py are defined as classes, but two are defined as Table objects. Many-to-many association tables that consist of only foreign keys (and no data attributes) are not useful as objects but are still needed when calling relationship(). So sqlacodegen generates a raw Table object for these instead of defining a class. The good news is, you don't need to generate data for those tables directly.

Cool!

Notice also that Reviewer extends Person, because of the 1-1 relationship in the relational database design. This was also the intended design in the original EER model.

Part 4: Generate Fake Data

  1. Create a new file named fakedata.py. Write a docstring at the top, followed by your first and last name:
    """Generate fake data for the Conference Review System."""
    
    __author__ = "Your Name"
    
  2. Run this command to see what classes are defined in models.py. Import these classes in fakedata.py.
    grep class models.py
    
  3. Write a program to generate data using each of the 10 classes. In the end, data should exist in all 12 tables.
    • Be creative but realistic. The data values you choose should be reasonable for a Conference Review System.
    • You will need to call create_engine() and construct a Session object that connects to the actual database.
  4. The following table summarizes how much data you should generate for each of the tables.

    Table Rows
    person 10 rows
    reviewer 5 rows
    affiliation 4 rows
    person_affiliation 8 rows
    conference 1 row
    paper 3 rows
    paper_author 8 rows
    review 3 rows
    history 5 rows
    topic 20 rows
    expertise 10 rows
    paper_topic 9 rows

    Warning

    Because Reviewer extends Person, adding a Reviewer object will insert into both the reviewer table and person table. So to get 10 person rows, you need to create 5 regular persons and 5 reviewers.

    Tip

    For convenience, you may create lists for some of the objects you generate. That way, you can refer back to those objects later when generating related objects. For example:

    # Add all Person and Reviewer objects to this list
    persons: list[Person] = []
    
    # Make use of the list while generating paper_author
    author = random.choice(persons)
    

  5. Develop your code incrementally and test as you go. For convenience, you should drop and recreate all tables at the start of your main() function. That way, you won't have to deal with existing data left over from previous runs.

    def main():
        # Reset the database
        engine = create_engine(DB_URL)
        Base.metadata.drop_all(engine)
        Base.metadata.create_all(engine)
    

    • Isn't it nice that you can just call drop_all() without having to specify the drop order? (like you did in drop.sql)
  6. Make sure your final program runs without errors and populates the entire database. In the end:

    • The autograder will look your schema (in sec1 or sec2) to verify that all tables have been created and have the correct number of rows.
    • Your instructor will also review your code to see what approach you took to generate the data.

Submission

Submit the following files via Gradescope:

  • drop.sql
  • fakedata.py