SQLite (DB Browser)
Today you'll learn about SQLite, the most used database engine in the world. Working with large amounts of data is fun—and it's easier than you might think!
Part 1: Internet Movie Database¶
You might already be familiar with IMDb. If not, check out imdb.com:
Discussion
- What are your favorite movies / TV shows?
- Who are your favorite actors / actresses?
- What are your favorite characters?
- What are your favorite genres?
Part 2: Install DB Browser¶
To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.
Please ask if you need help getting DB Browser installed! Here is what the software looks like when running:
Part 3: Download IMDb SQLite¶
Next you'll need some data to play with. Download one (or more) of the example files:
- Small example: titles since 2020 (39 MB)
- Medium example: titles since 2015 (413 MB)
- Large example: titles since 2000 (963 MB)
Open the file with DB Browser. Click the "Browse Data" tab to explore the data:
You should see the following five tables. Notice how many rows there are for each table!
Table | Description | Small | Medium | Large |
---|---|---|---|---|
title |
movies, shorts, TV series, video games, etc. | 37,342 | 484,420 | 1,172,725 |
aka |
"also known as" – in other regions/languages | 56,762 | 780,347 | 2,041,268 |
rating |
IMDb ratings (voted by users) for some titles | 3,746 | 105,856 | 346,681 |
castcrew |
which people/characters are in which titles | 217,965 | 2,786,868 | 6,967,649 |
person |
actors, composers, producers, writers, etc. | 173,176 | 1,382,189 | 2,704,809 |
Database Diagram¶
Here is how the data is organized.
The *
symbol means many and indicates a foreign key.
Click the diagram to open a PDF version in a new tab.
You'll need this diagram during the next part of the lab.
Acknowledgments
The original version of this data came from IMDb Non-Commercial Datasets. The diagram was drawn using dbdiagram.io.
Part 4: Learn SQL by Example¶
Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!
SQL stands for Structured Query Language. A query is a specific question you would like to ask. Here is what a query looks like:
SELECT columns
FROM tables
WHERE conditions
Example #1: Find a movie or person¶
A fun movie from about 10 years ago was Moana. You can search for Moana like this:
SELECT *
FROM title
WHERE title = 'Moana'
You can also search for people, like Dwayne Johnson, in a similar way:
SELECT *
FROM person
WHERE name = 'Dwayne Johnson'
Exercise
- Look up other movies by title, by year, or by type.
- Look up other people by name, by birth, or by profession.
Example #2: Join multiple tables¶
To query multiple tables, you need to join them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.
SELECT *
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
Here is how you can see the people related to a movie:
SELECT *
FROM title
JOIN castcrew USING (tid)
JOIN person USING (pid)
WHERE title = 'Moana'
AND type = 'movie'
Exercise
- Find a movie you like in a language other than English.
- Find the movies with your favorite actors, directors, etc.
Example #3: Sort and limit the results¶
Sometimes you want to see only part of the results.
Rather than SELECT *
(everything), you can say which columns you want:
SELECT aka, region, language
You can also sort the results using ORDER BY
.
In the aka
table, the region
column refers to different areas of the world.
ORDER BY region
Finally, you can LIMIT
the number of results to display at a time.
This is helpful when there are many results, and you only want to see a few of them.
LIMIT 10
Putting the pieces together, your query might look like this:
SELECT aka, region, language
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
ORDER BY region
LIMIT 10
Exercise
- Add an
ORDER BY
to the previous queries you wrote. - Add a
LIMIT
to the previous queries you wrote.
Example #4: Counting and grouping¶
Note: Lines beginning with two dashes are comments.
-- how many titles in the database?
SELECT count(*)
FROM title
SQL is very useful for analyzing data. For example:
-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type
Notice the difference between the previous two queries. The first query counts all rows in a table. The second query puts the rows into groups, based on type, and then counts how many are in each group.
Exercise
- How many aka rows are there for Moana?
- How many titles are there for Dwayne Johnson?
- How many aka rows are there for each movie?
- How many titles are there for each actor/actress?
Part 5: Querying CSV Files¶
SQLite is useful for running queries on other data. Many data files are in CSV format. CSV stands for Comma Separated Values. DB Browser can import and export CSV files.
-
Visit the CSV Datasets page of the CORGIS project and download a dataset of your choice.
-
In DB Browser, go to File → New In-Memory Database. Click Cancel when the Edit table definition popup appears.
-
Go to File → Import → Table from CSV file… and open the file you downloaded. DB Browser will show a preview of the data and automatically create a table for the data.
-
Click the Browse Data tab to verify the data imported correctly. Click the Execute SQL tab and write several queries for your data.
Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.
Part 6: Your Browser History¶
Many desktop applications and mobile apps use SQLite to store their data. For example, most web browsers use SQLite to store browsing history, cookies, bookmarks, etc. Let's take a look at what data your web browser has been collecting about you!
-
Find the profile directory for your web browser. Common locations (relative to your home directory) include:
- Chrome
- Linux:
.config/google-chrome/Default/
- macOS:
Library/Application Support/Google/Chrome/Default/
- Windows:
AppData\Local\Google\Chrome\User Data\Default\
- Linux:
- Edge
- Linux:
.config/microsoft-edge/Default/
- macOS:
Library/Application Support/Microsoft Edge/Default/
- Windows:
AppData\Local\Microsoft\Edge\User Data\Default\
- Linux:
- Firefox
- Linux:
.mozilla/firefox/{random_directory_name}/
- macOS:
Library/Application Support/Firefox/Profiles/{random_directory_name}/
- Windows:
AppData\Roaming\Mozilla\Firefox\Profiles\{random_directory_name}\
- Linux:
- Safari
- macOS:
Library/Safari/
- macOS:
- Chrome
-
Open one of the SQLite database files and examine the data.
- You might need to close your web browser (application) if the database file is locked.
- Use the Database Structure and Browse Data tabs to figure out what data is available.
-
Write some queries over the data, for example:
- Which websites do you visit most often?
- How many unique websites do you visit?
- Which websites store the most cookies?
- What is the average cookie size in bytes?
Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.
Going Further¶
There are many tutorials online for learning more about SQL. Here are a couple good ones:
See also Chapter 2: Chapter 2. Relational Databases and Chapter 3: Chapter 3. Complex Queries of the textbook. We will study SQL in more detail during the next several weeks of the course.