Skip to content

GP3: Create Sample Data

Due: Monday, April 8th

Mockaroo logo
Image source: mockaroo.com

Instructions

In this assignment, you will create sample data for your database. Here are two ways you can do this:

  • Mockaroo – a website that lets you generate all kinds of data in all kinds of formats.
  • Python faker module – a Python module that helps you generate all kinds of fake data in various formats.

The structure of each group's database varies, so the data requirements are given broadly, not table by table.

Requirements

  • At least 20 hotels.
  • At least 5 room types per hotel.
  • At least 100 individual rooms per hotel.
    • At least 3 of each room type; the distribution of the other room types does not matter.
  • At least 3 seasons per hotel.
  • At least 200 guests.
  • At least 3 guest categories (e.g., VIP, government, military, etc.) that give a discount.
  • At least 10 employees per hotel.
  • Depending on the type of your extra service, at least 1 per hotel, and more if the type of service needs more.
  • However your database handles price, be sure to have some room type prices that vary by the day of the week.
  • Let us call April 1, 2024 the "current date", so that you can maintain the idea of past and future in your data as the real current date changes.
  • Create these specific conditions in at least one hotel:
    • Create 100 reservations, with half in the past and half in the future, according to the "current date".
    • There should be at least 10 reviews for this hotel that correspond to reservations that happened in the "past".
    • Assure that 25 of the reservations are for multiple rooms of the same type, for multiple rooms of different room types, and both. If your database stores one room per reservation, in order to meet this requirement, you must be sure that there are multiple reservations for the same guest in the same hotel for the same dates.
    • Create the conditions such that 20 rooms are "currently" occupied, which means that the reservations started a few days before the current date and end on the current date or a few days afterwards. Use generated occupant data which matches each occupied room's capacity.
    • Make sure that all the reservations from the "past" have appropriately matching occupant data.
    • In reservations that involve multiple rooms, make sure only one of them is occupied by the guest and others are occupied by generated occupants.
    • Make sure that half of the occupied rooms from the current time into the past have used your group-specific service.
    • Create the current and past billing records for all current/past guests, according to how your model stores that information.
    • Make sure half of the current/past guests belong to a discount category.

Submission

Create a new folder in your GitHub repository called data and put all the files you created and used for loading data into that folder.

Your files may include the following, or may be primarily Python files, or some combination. The only required file is the README, which must explain the contents and use of the other files:

  • Create a csv file for the bulk data for each of your tables, fulfilling the requirements above.
  • Create a single script called load.sql that loads all of the tables.
    • Use the \copy command of psql to load CSV files, for example:
      \copy hotel FROM hotel.csv WITH CSV HEADER
      
    • Alternatively, use \copy … from stdin and include the data directly in the load.sql script, for example:
      \copy hotel from stdin;
      1	Days Inn
      2	Marriott
      3	Hilton
      \.
      
  • For the specific conditions in the single hotel, you may choose to solve this by loading bulk data from csv files or by writing INSERT statements. Put the code to load the data for this one hotel into a separate file called specific.sql. Be sure that this file is well-commented.
  • Create a README.md that explains how you created all the data (using Mockaroo or faker) and lists all the csv files with a brief explanation of their purpose. The README should have sufficient explanation so that anyone (i.e., your future self) could generate the data again.
    • If you used Mockaroo to generate the data, record the parameters you used. Screenshots are acceptable.
    • If you used faker to generate the data, include the Python scripts in a subdirectory called faker.
  • At the time this assignment is due, your group database should be populated with data and the constraints should be turned on (run alter.sql).