SQLite (DB Browser)
Today you'll learn about SQLite, the most used database engine in the world. Working with large amounts of data is fun—and it's easier than you might think!
Part 1: Internet Movie Database¶
You might already be familiar with IMDb. If not, check out imdb.com:
Discussion
- What are your favorite movies / TV shows?
- Who are your favorite actors / actresses?
- What are your favorite characters?
- What are your favorite genres?
Part 2: Install DB Browser¶
To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.
Please ask if you need help getting DB Browser installed! Here is what the software looks like when running:
Part 3: Download IMDb SQLite¶
Next you'll need some data to play with. Download one (or more) of the example files:
- Small example: titles since 2024 (73 MB)
- Medium example: titles since 2019 (640 MB)
- Large example: titles since 2014 (1.2 GB)
Open the file with DB Browser. Click the "Browse Data" tab to explore the data:
You should see the following five tables. Notice how many rows there are for each table!
Table | Description | Small | Medium | Large |
---|---|---|---|---|
title |
movies, shorts, TV series, video games, etc. | 45,934 | 504,383 | 998,476 |
aka |
"also known as" – in other regions/languages | 143,166 | 1,617,115 | 3,013,709 |
rating |
IMDb ratings (voted by users) for some titles | 8,589 | 122,182 | 248,896 |
castcrew |
which people/characters are in which titles | 444,978 | 4,495,725 | 8,841,546 |
person |
actors, composers, producers, writers, etc. | 276,950 | 1,736,379 | 2,883,868 |
Database Diagram¶
Here is how the data is organized.
The *
symbol means many and indicates a foreign key.
Click the diagram to open a PDF version in a new tab.
You'll need this diagram during the next part of the lab.
Acknowledgments
The original version of this data came from IMDb Non-Commercial Datasets. The diagram was drawn using dbdiagram.io.
Part 4: Learn SQL by Example¶
Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!
SQL stands for Structured Query Language. A query is a specific question you would like to ask. Here is what a query looks like:
SELECT columns
FROM tables
WHERE conditions
Example #1: Find a movie or person¶
A fun movie from about 10 years ago was Moana. You can search for Moana like this:
SELECT *
FROM title
WHERE title = 'Moana'
You can also search for people, like Dwayne Johnson, in a similar way:
SELECT *
FROM person
WHERE name = 'Dwayne Johnson'
Exercise
- Look up other movies by title, by year, or by type.
- Look up other people by name, by birth, or by profession.
Example #2: Join multiple tables¶
To query multiple tables, you need to join them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.
SELECT *
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
Here is how you can see the people related to a movie:
SELECT *
FROM title
JOIN castcrew USING (tid)
JOIN person USING (pid)
WHERE title = 'Moana'
AND type = 'movie'
Exercise
- Find a movie you like in a language other than English.
- Find the movies with your favorite actors, directors, etc.
Example #3: Sort and limit the results¶
Sometimes you want to see only part of the results.
Rather than SELECT *
(everything), you can say which columns you want:
SELECT aka, region, language
You can also sort the results using ORDER BY
.
In the aka
table, the region
column refers to different areas of the world.
ORDER BY region
Finally, you can LIMIT
the number of results to display at a time.
This is helpful when there are many results, and you only want to see a few of them.
LIMIT 10
Putting the pieces together, your query might look like this:
SELECT aka, region, language
FROM title
JOIN aka USING (tid)
WHERE title = 'Moana'
AND type = 'movie'
ORDER BY region
LIMIT 10
Exercise
- Add an
ORDER BY
to the previous queries you wrote. - Add a
LIMIT
to the previous queries you wrote.
Example #4: Counting and grouping¶
Note: Lines beginning with two dashes are comments.
-- how many titles in the database?
SELECT count(*)
FROM title
SQL is very useful for analyzing data. For example:
-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type
Notice the difference between the previous two queries. The first query counts all rows in a table. The second query puts the rows into groups, based on type, and then counts how many are in each group.
Exercise
- How many aka rows are there for Moana?
- How many titles are there for Dwayne Johnson?
- How many aka rows are there for each movie?
- How many titles are there for each actor/actress?
Part 5: Insert, Update, Delete¶
You can also use SQL to make changes in a database.
The three basic operations are INSERT
, UPDATE
, and DELETE
.
Example #1: Add a title¶
An INSERT
statement adds a new row at the end of a table:
INSERT INTO table
VALUES (value1, value2, ...)
For example, to add a new movie that (for some reason?) is not in IMDb:
INSERT INTO title
VALUES (374, 'movie', 'JMU Beat Virginia Tech', 'JMU Beat Virginia Tech',
0, 2010, NULL, 90, 'Action,Documentary')
Note: For now, you will need to make up tid
values (like 374
).
You can see what values are available using a query like this:
SELECT min(tid), max(tid)
FROM title
Exercise
- Add yourself to the database. Choose any profession you would like!
- Add yourself to an existing title of your choice (via
castcrew
table). - Add a rating for any title that doesn't yet have a rating in the database.
Example #2: Edit a title¶
An UPDATE
statement applies one or more assignments on a set of rows:
UPDATE table
SET column1 = expression1,
column2 = expression2,
...
WHERE conditions
For example, to edit the movie inserted in the previous section:
UPDATE title
SET title = 'JMU Beat Virginia Tech Again',
year = 2024
WHERE tid = 374
Note: The WHERE
clause is important!
If an UPDATE
statement has no WHERE
clause, every row will be updated.
Exercise
- Edit the title (or at least change the spelling) of a famous movie of your choice.
- Add 1 to the number of votes for all ratings with an average score of 5 or higher.
- Update your own person record with titles that you would like to be known for.
Example #3: Remove a title¶
A DELETE
statement removes a set of rows from the database:
DELETE FROM table
WHERE conditions
For example, to delete all titles from the year 2020:
DELETE FROM title
WHERE year = 2020
Note: The WHERE
clause is important!
If a DELETE
statement has no WHERE
clause, every row will be deleted.
Exercise
- Remove an actor/actress that you don't like from the database.
- Remove all ratings with under 100 votes and an average over 5.
- Remove titles from the year 2020 with a runtime under 10 minutes.
Part 6: Querying CSV Files¶
SQLite is useful for running queries on other data. Many data files are in CSV format. CSV stands for Comma Separated Values. DB Browser can import and export CSV files.
-
Visit the CSV Datasets page of the CORGIS project and download a dataset of your choice.
-
In DB Browser, go to File → New In-Memory Database. Click Cancel when the Edit table definition popup appears.
-
Go to File → Import → Table from CSV file… and open the file you downloaded. DB Browser will show a preview of the data and automatically create a table for the data.
-
Click the Browse Data tab to verify the data imported correctly. Click the Execute SQL tab and write several queries for your data.
Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.
Part 7: Your Browser History¶
Many desktop applications and mobile apps use SQLite to store their data. For example, most web browsers use SQLite to store browsing history, cookies, bookmarks, etc. Let's take a look at what data your web browser has been collecting about you!
-
Find the profile directory for your web browser. Common locations (relative to your home directory) include:
- Chrome
- Linux:
.config/google-chrome/Default/
- macOS:
Library/Application Support/Google/Chrome/Default/
- Windows:
AppData\Local\Google\Chrome\User Data\Default\
- Linux:
- Edge
- Linux:
.config/microsoft-edge/Default/
- macOS:
Library/Application Support/Microsoft Edge/Default/
- Windows:
AppData\Local\Microsoft\Edge\User Data\Default\
- Linux:
- Firefox
- Linux:
.mozilla/firefox/{random_directory_name}/
- macOS:
Library/Application Support/Firefox/Profiles/{random_directory_name}/
- Windows:
AppData\Roaming\Mozilla\Firefox\Profiles\{random_directory_name}\
- Linux:
- Safari
- macOS:
Library/Safari/
- macOS:
- Chrome
-
Open one of the SQLite database files and examine the data.
- You might need to close your web browser (application) if the database file is locked.
- Use the Database Structure and Browse Data tabs to figure out what data is available.
-
Write some queries over the data, for example:
- Which websites do you visit most often?
- How many unique websites do you visit?
- Which websites store the most cookies?
- What is the average cookie size in bytes?
Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.
Going Further¶
There are many tutorials online for learning more about SQL. Here are a couple good ones:
See also Chapter 2. Relational Databases and Chapter 5. Complex Queries of the textbook. We will study SQL in more detail during the next several weeks of the course.