CS 374: Database Systems
James Madison University, Spring 2022

Jan 20: Overview of Database Systems

Lesson Outline

30 min Lecture Slides
* What is a database?
* Why use a DBMS?
* History, features
* SQL vs Python
* Course logistics
10 min Discord Tutorial
* Overview of text channels / how to use voice channels
* User settings, notification settings, privacy settings
30 min SQLite Lab
* See instructions below -- nothing to turn in today
5 min Wrap-Up
* Explaination of what to do before the next class

Before Saturday

  1. Pre-Survey: Please submit this "quiz" if you haven't already!

  2. 1Pic1Par: Formally introduce yourself to the rest of the class.

  3. Textbook: Read PDBM 1.1--1.5 (14 pages).
    * Before that, read the last page of the Preface.

  4. Homework: Start on HW1 due this Sunday.
    * Answer questions while you read the book!


SQLite Lab

Welcome to the first day of class! Today you'll learn about databases and SQL. Working with large amounts of data is fun—and it's easier than you might think!

Part 1: Internet Movie Database

You might already be familiar with IMDb. If not, check it out:

IMDb Header Page

Discussion Questions:

Part 2: Install DB Browser

To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.

Please ask if you need help getting DB Browser installed! Here is what it looks like when running:

DB Browser Screenshot

Part 3: Download IMDb SQLite

Next you'll need some data to play with. Download one (or more) of the example files:

Open the file with DB Browser. Click the "Browse Data" tab to explore the data:

Browse Data

You should see the following five tables. Notice how many rows there are for each table!

Table Description Small Medium Large
title movies, shorts, TV series, video games, etc. 37,342 484,420 1,172,725
aka "also known as" – in other regions/languages 56,762 780,347 2,041,268
rating IMDb ratings (voted by users) for some titles 3,746 105,856 346,681
castcrew which people/characters are in which titles 217,965 2,786,868 6,967,649
person actors, composers, producers, writers, etc. 173,176 1,382,189 2,704,809

Database Structure

Here is how the data is organized. Click the diagram below to download a PDF version. You'll need this diagram during the next part.

IMDb Diagram

By the way, this data originally came from https://www.imdb.com/interfaces/.

Part 4: Learn SQL by Example

Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!

Execute SQL

SQL stands for Structured Query Language. A "query" is a specific question you would like to ask. Here is what a query looks like:

SELECT columns
FROM tables
WHERE conditions

Example #1: Find a movie or person

A recent movie that I liked was Moana. You can search for it like this:

SELECT *
FROM title
WHERE title = 'Moana'

You can also search for people, like Dwayne Johnson, in a similar way:

SELECT *
FROM person
WHERE name = 'Dwayne Johnson'

Exercises:

Example #2: Join multiple tables

To query multiple tables, you need to "join" them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.

SELECT *
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'

Here is how you can see the people related to a movie:

SELECT *
FROM title
  JOIN castcrew USING (tid)
  JOIN person USING (pid)
WHERE title = 'Moana'
  AND type = 'movie'

Exercises:

Example #3: Sort and limit the results

Sometimes you want to see only part of the results. Rather than SELECT * (everything), you can say which columns you want:

SELECT aka, region, language

You can also sort the results using ORDER BY. In the aka table, the region column refers to different areas of the world.

ORDER BY region

Finally, you can LIMIT the number of results to display at a time. This is helpful when there are many results, and you only want to see a few of them.

LIMIT 10

Putting the pieces together, your query might look like this:

SELECT aka, region, language
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'
ORDER BY region
LIMIT 10

Exercises:

Example #4: Counting and grouping

Lines beginning with two dashes are comments.

-- how many titles in the database?
SELECT count(*)
FROM title

SQL is very useful for analyzing data. For example:

-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type

Notice the difference between the previous two queries. The first one counts all rows in a table. The second one puts the rows into groups, based on type, and then counts how many are in each group.

Exercises:

Going Further

There are many tutorials online for learning more about SQL. Here are a couple good ones: