SQLite Lab
Welcome to the first day of class! Today you'll learn about databases and SQL. Working with large amounts of data is fun—and it's easier than you might think!
Part 1: Internet Movie Database
You might already be familiar with IMDb. If not, check it out:
Discussion Questions:
- What are your favorite movies / TV shows?
- Who are your favorite actors / actresses?
- What are your favorite characters?
- What are your favorite genres?
Part 2: Install DB Browser
To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.
Please ask if you need help getting DB Browser installed! Here is what it looks like when running:
Part 3: Download IMDb SQLite
Next you'll need some data to play with. Download one (or more) of the example files:
Open the file with DB Browser. Click the "Browse Data" tab to explore the data:
You should see the following five tables. Notice how many rows there are for each table!
Table |
Description |
Small |
Medium |
Large |
title |
movies, shorts, TV series, video games, etc. |
37,342 |
484,420 |
1,172,725 |
aka |
"also known as" – in other regions/languages |
56,762 |
780,347 |
2,041,268 |
rating |
IMDb ratings (voted by users) for some titles |
3,746 |
105,856 |
346,681 |
castcrew |
which people/characters are in which titles |
217,965 |
2,786,868 |
6,967,649 |
person |
actors, composers, producers, writers, etc. |
173,176 |
1,382,189 |
2,704,809 |
Database Structure
Here is how the data is organized. Click the diagram below to download a PDF version. You'll need this diagram during the next part.
By the way, this data originally came from https://www.imdb.com/interfaces/.
Part 4: Learn SQL by Example
Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!
SQL stands for Structured Query Language. A "query" is a specific question you would like to ask. Here is what a query looks like:
SELECT columns
FROM tables
WHERE conditions
Example #1: Find a movie or person
A recent movie that I liked was Moana. You can search for it like this:
SELECT *
FROM title
WHERE title = 'Moana'
You can also search for people, like Dwayne Johnson, in a similar way:
SELECT *
FROM person
WHERE name = 'Dwayne Johnson'
Exercises:
- Look up other movies by title, by year, or by type.
- Look up other people by name, by birth, or by profession.
Example #2: Join multiple tables
To query multiple tables, you need to "join" them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.
SELECT *
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
Here is how you can see the people related to a movie:
SELECT *
FROM title
JOIN castcrew USING (tid)
JOIN person USING (pid)
WHERE title = 'Moana'
AND type = 'movie'
Exercises:
- Find a movie you like in a language other than English.
- Find the movies with your favorite actors, directors, etc.
Example #3: Sort and limit the results
Sometimes you want to see only part of the results. Rather than SELECT * (everything), you can say which columns you want:
SELECT aka, region, language
You can also sort the results using ORDER BY. In the aka table, the region column refers to different areas of the world.
ORDER BY region
Finally, you can LIMIT the number of results to display at a time. This is helpful when there are many results, and you only want to see a few of them.
LIMIT 10
Putting the pieces together, your query might look like this:
SELECT aka, region, language
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
ORDER BY region
LIMIT 10
Exercises:
- Add an ORDER BY to the previous queries you wrote.
- Add a LIMIT to the previous queries you wrote.
Example #4: Counting and grouping
Lines beginning with two dashes are comments.
-- how many titles in the database?
SELECT count(*)
FROM title
SQL is very useful for analyzing data. For example:
-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type
Notice the difference between the previous two queries. The first one counts all rows in a table. The second one puts the rows into groups, based on type, and then counts how many are in each group.
Exercises:
- How many aka rows are there for Moana?
- How many titles are there for Dwayne Johnson?
- How many aka rows are there for each movie?
- How many titles are there for each actor/actress?
Going Further
There are many tutorials online for learning more about SQL. Here are a couple good ones: