Skip to content

SQLite (DB Browser)

Today you'll learn about SQLite, the most used database engine in the world. Working with large amounts of data is fun—and it's easier than you might think!

Part 1: Internet Movie Database

You might already be familiar with IMDb. If not, check out imdb.com:

IMDb picture collage

Discussion

  • What are your favorite movies / TV shows?
  • Who are your favorite actors / actresses?
  • What are your favorite characters?
  • What are your favorite genres?

Part 2: Install DB Browser

To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.

Please ask if you need help getting DB Browser installed! Here is what the software looks like when running:

DB Browser Screenshot

Part 3: Download IMDb SQLite

Next you'll need some data to play with. Download one (or more) of the example files:

Open the file with DB Browser. Click the "Browse Data" tab to explore the data:

DB Browser with the Browse Data tab circled

You should see the following five tables. Notice how many rows there are for each table!

Table Description Small Medium Large
title movies, shorts, TV series, video games, etc. 37,342 484,420 1,172,725
aka "also known as" – in other regions/languages 56,762 780,347 2,041,268
rating IMDb ratings (voted by users) for some titles 3,746 105,856 346,681
castcrew which people/characters are in which titles 217,965 2,786,868 6,967,649
person actors, composers, producers, writers, etc. 173,176 1,382,189 2,704,809

Database Diagram

Here is how the data is organized. The * symbol means many and indicates a foreign key. Click the diagram to open a PDF version in a new tab. You'll need this diagram during the next part of the lab.

IMDb Diagram

Acknowledgments

The original version of this data came from IMDb Non-Commercial Datasets. The diagram was drawn using dbdiagram.io.

Part 4: Learn SQL by Example

Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!

DB Browser with the Execute SQL tab circled

SQL stands for Structured Query Language. A query is a specific question you would like to ask. Here is what a query looks like:

SELECT columns
FROM tables
WHERE conditions

Example #1: Find a movie or person

A fun movie from about 10 years ago was Moana. You can search for Moana like this:

SELECT *
FROM title
WHERE title = 'Moana'

You can also search for people, like Dwayne Johnson, in a similar way:

SELECT *
FROM person
WHERE name = 'Dwayne Johnson'

Exercise

  • Look up other movies by title, by year, or by type.
  • Look up other people by name, by birth, or by profession.

Example #2: Join multiple tables

To query multiple tables, you need to join them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.

SELECT *
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'

Here is how you can see the people related to a movie:

SELECT *
FROM title
  JOIN castcrew USING (tid)
  JOIN person USING (pid)
WHERE title = 'Moana'
  AND type = 'movie'

Exercise

  • Find a movie you like in a language other than English.
  • Find the movies with your favorite actors, directors, etc.

Example #3: Sort and limit the results

Sometimes you want to see only part of the results. Rather than SELECT * (everything), you can say which columns you want:

SELECT aka, region, language

You can also sort the results using ORDER BY. In the aka table, the region column refers to different areas of the world.

ORDER BY region

Finally, you can LIMIT the number of results to display at a time. This is helpful when there are many results, and you only want to see a few of them.

LIMIT 10

Putting the pieces together, your query might look like this:

SELECT aka, region, language
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'
ORDER BY region
LIMIT 10

Exercise

  • Add an ORDER BY to the previous queries you wrote.
  • Add a LIMIT to the previous queries you wrote.

Example #4: Counting and grouping

Note: Lines beginning with two dashes are comments.

-- how many titles in the database?
SELECT count(*)
FROM title

SQL is very useful for analyzing data. For example:

-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type

Notice the difference between the previous two queries. The first query counts all rows in a table. The second query puts the rows into groups, based on type, and then counts how many are in each group.

Exercise

  • How many aka rows are there for Moana?
  • How many titles are there for Dwayne Johnson?
  • How many aka rows are there for each movie?
  • How many titles are there for each actor/actress?

Part 5: Querying CSV Files

SQLite is useful for running queries on other data. Many data files are in CSV format. CSV stands for Comma Separated Values. DB Browser can import and export CSV files.

  1. Visit the CSV Datasets page of the CORGIS project and download a dataset of your choice.

  2. In DB Browser, go to File → New In-Memory Database. Click Cancel when the Edit table definition popup appears.

  3. Go to File → Import → Table from CSV file… and open the file you downloaded. DB Browser will show a preview of the data and automatically create a table for the data.

  4. Click the Browse Data tab to verify the data imported correctly. Click the Execute SQL tab and write several queries for your data.

Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.

Part 6: Your Browser History

Many desktop applications and mobile apps use SQLite to store their data. For example, most web browsers use SQLite to store browsing history, cookies, bookmarks, etc. Let's take a look at what data your web browser has been collecting about you!

  1. Find the profile directory for your web browser. Common locations (relative to your home directory) include:

    • Chrome
      • Linux: .config/google-chrome/Default/
      • macOS: Library/Application Support/Google/Chrome/Default/
      • Windows: AppData\Local\Google\Chrome\User Data\Default\
    • Edge
      • Linux: .config/microsoft-edge/Default/
      • macOS: Library/Application Support/Microsoft Edge/Default/
      • Windows: AppData\Local\Microsoft\Edge\User Data\Default\
    • Firefox
      • Linux: .mozilla/firefox/{random_directory_name}/
      • macOS: Library/Application Support/Firefox/Profiles/{random_directory_name}/
      • Windows: AppData\Roaming\Mozilla\Firefox\Profiles\{random_directory_name}\
    • Safari
      • macOS: Library/Safari/
  2. Open one of the SQLite database files and examine the data.

    • You might need to close your web browser (application) if the database file is locked.
    • Use the Database Structure and Browse Data tabs to figure out what data is available.
  3. Write some queries over the data, for example:

    • Which websites do you visit most often?
    • How many unique websites do you visit?
    • Which websites store the most cookies?
    • What is the average cookie size in bytes?

Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.

Going Further

There are many tutorials online for learning more about SQL. Here are a couple good ones:

See also Chapter 2: Chapter 2. Relational Databases and Chapter 3: Chapter 3. Complex Queries of the textbook. We will study SQL in more detail during the next several weeks of the course.