HW4: Models and Data
Due: Tuesday, Sep 21st

In the EER to Relational Mapping lab, you created a DBML model for the Conference Review System. The database diagram below is a further iteration of that design that features additional tables, column constraints, and custom data types. In this assignment, you will create this database on the PostgreSQL server and use SQLAlchemy and Faker to load the database with test data.
Objectives¶
- Write SQL scripts for creating and dropping database tables/types.
- Use
sqlacodegento generate Python model classes for a database. - Write a Python script that uses the model classes to generate data.
Part 1: Write Create Script¶
- Create a
hw4folder for this assignment – you will have multiple files in the end. - Open a new diagram in dbdiagram.io and paste the contents of crs.dbml, the source code for the above diagram.
- Export the diagram to PostgreSQL and rename the resulting file as
create.sql. Move the file to yourhw4folder. - Run pgAdmin and open a Query Tool in the
sec1(12:45 with Dr. Rizvi) orsec2(2:20 with Dr. Mayfield) database. - Open the
create.sqlfile in pgAdmin and run the code. There should not be any errors, unless you missed a step. - Go to your schema in the
sec1/sec2database. Right click Tables and select Refresh. You should see 12 tables.
Part 2: Write Drop Script¶
- Open a terminal in your
hw4folder, and run this command. The output will be a list of theCREATEstatements.grep CREATE create.sql - Create a new file named
drop.sqland write aDROPstatement for eachCREATEstatement using this format:DROP TABLE IF EXISTS person; - Open the
drop.sqlfile in pgAdmin and try running the script. If the code runs without errors, good for you!- You will likely get constraint errors if your
DROPstatements are in the same order as the originalCREATEstatements. - Reorder the drop statements so that tables with
FOREIGN KEYconstraints are dropped before the tables they reference. - Hint #1: Search for all "
>" operators in the DBML source code to see which tables reference other tables. - Hint #2: The last line of your
drop.sqlscript should begin with the wordsDROP TYPE.
- You will likely get constraint errors if your
- Make sure you are able to run your
drop.sqlandcreate.sqlscripts, in that order, without getting errors.
Part 3: SQLAlchemy Models¶
- Determine the URL for your database:
- Example
postgresql+psycopg://user:password@data.cs.jmu.edu/sec1(orsec2). - Replace
user:passwordwith your username and database password (student number). - If you connecting via SSH tunnel, use
localhostinstead ofdata.cs.jmu.edu.
- Example
- Run
sqlacodegenon the URL and redirect the output to a file namedmodels.py.- See the instructions on Slide 13 of the Introduction to SQLAlchemy lecture.
- For simplicity on this assignment, do not modify the generated
models.pyfile.
Note
Notice that most of the tables in models.py are defined as classes, but two are defined as Table objects.
Many-to-many association tables that consist of only foreign keys (and no data attributes) are not useful as objects but are still needed when calling relationship().
So sqlacodegen generates a raw Table object for these instead of defining a class.
The good news is, you don't need to generate data for those tables directly.
Cool!
Notice also that Reviewer extends Person, because of the 1-1 relationship in the relational database design.
This was also the intended design in the original EER model.
Part 4: Generate Fake Data¶
- Create a new file named
fakedata.py. Write a docstring at the top, followed by your first and last name:"""Generate fake data for the Conference Review System.""" __author__ = "Your Name" - Run this command to see what classes are defined in
models.py. Import these classes infakedata.py.grep class models.py - Write a program to generate data using each of the 10 classes.
In the end, data should exist in all 12 tables.
- Be creative but realistic. The data values you choose should be reasonable for a Conference Review System.
- You will need to call
create_engine()and construct aSessionobject that connects to the actual database.
-
The following table summarizes how much data you should generate for each of the tables.
Table Rows person 10 rows reviewer 5 rows affiliation 4 rows person_affiliation 8 rows conference 1 row paper 3 rows paper_author 8 rows review 3 rows history 5 rows topic 20 rows expertise 10 rows paper_topic 9 rows Warning
Because
ReviewerextendsPerson, adding aReviewerobject will insert into both thereviewertable andpersontable. So to get 10personrows, you need to create 5 regular persons and 5 reviewers.Tip
For convenience, you may create lists for some of the objects you generate. That way, you can refer back to those objects later when generating related objects. For example:
# Add all Person and Reviewer objects to this list persons: list[Person] = [] # Make use of the list while generating paper_author author = random.choice(persons) -
Develop your code incrementally and test as you go. For convenience, you should drop and recreate all tables at the start of your
main()function. That way, you won't have to deal with existing data left over from previous runs.def main(): # Reset the database engine = create_engine(DB_URL) Base.metadata.drop_all(engine) Base.metadata.create_all(engine)- Isn't it nice that you can just call
drop_all()without having to specify the drop order? (like you did indrop.sql)
- Isn't it nice that you can just call
-
Make sure your final program runs without errors and populates the entire database. In the end:
- The autograder will look your schema (in
sec1orsec2) to verify that all tables have been created and have the correct number of rows. - Your instructor will also review your code to see what approach you took to generate the data.
- The autograder will look your schema (in
Submission¶
Submit the following files via Gradescope:
drop.sqlfakedata.py