Skip to content

SQLite (DB Browser)

Today you'll learn about SQLite, the most used database engine in the world. Working with large amounts of data is fun—and it's easier than you might think!

Part 1: Internet Movie Database

You might already be familiar with IMDb. If not, check out imdb.com:

IMDb picture collage

Discussion

  • What are your favorite movies / TV shows?
  • Who are your favorite actors / actresses?
  • What are your favorite characters?
  • What are your favorite genres?

Part 2: Install DB Browser

To get started, you'll need to install DB Browser, a software tool for working with SQLite database files. Click the link for your operating system to download and run the installer.

Please ask if you need help getting DB Browser installed! Here is what the software looks like when running:

DB Browser Screenshot

Part 3: Download IMDb SQLite

Next you'll need some data to play with. Download one (or more) of the example files:

Open the file with DB Browser. Click the "Browse Data" tab to explore the data:

DB Browser with the Browse Data tab circled

You should see the following five tables. Notice how many rows there are for each table!

Table Description Small Medium Large
title movies, shorts, TV series, video games, etc. 45,934 504,383 998,476
aka "also known as" – in other regions/languages 143,166 1,617,115 3,013,709
rating IMDb ratings (voted by users) for some titles 8,589 122,182 248,896
castcrew which people/characters are in which titles 444,978 4,495,725 8,841,546
person actors, composers, producers, writers, etc. 276,950 1,736,379 2,883,868

Database Diagram

Here is how the data is organized. The * symbol means many and indicates a foreign key. Click the diagram to open a PDF version in a new tab. You'll need this diagram during the next part of the lab.

IMDb Diagram

Acknowledgments

The original version of this data came from IMDb Non-Commercial Datasets. The diagram was drawn using dbdiagram.io.

Part 4: Learn SQL by Example

Now click the "Execute SQL" button in DB Browser. It's time to start writing some code!

DB Browser with the Execute SQL tab circled

SQL stands for Structured Query Language. A query is a specific question you would like to ask. Here is what a query looks like:

SELECT columns
FROM tables
WHERE conditions

Example #1: Find a movie or person

A fun movie from about 10 years ago was Moana. You can search for Moana like this:

SELECT *
FROM title
WHERE title = 'Moana'

You can also search for people, like Dwayne Johnson, in a similar way:

SELECT *
FROM person
WHERE name = 'Dwayne Johnson'

Exercise

  • Look up other movies by title, by year, or by type.
  • Look up other people by name, by birth, or by profession.

Example #2: Join multiple tables

To query multiple tables, you need to join them together. Notice the lines in the database diagram (from Part 3). They show how the tables are related.

SELECT *
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'

Here is how you can see the people related to a movie:

SELECT *
FROM title
  JOIN castcrew USING (tid)
  JOIN person USING (pid)
WHERE title = 'Moana'
  AND type = 'movie'

Exercise

  • Find a movie you like in a language other than English.
  • Find the movies with your favorite actors, directors, etc.

Example #3: Sort and limit the results

Sometimes you want to see only part of the results. Rather than SELECT * (everything), you can say which columns you want:

SELECT aka, region, language

You can also sort the results using ORDER BY. In the aka table, the region column refers to different areas of the world.

ORDER BY region

Finally, you can LIMIT the number of results to display at a time. This is helpful when there are many results, and you only want to see a few of them.

LIMIT 10

Putting the pieces together, your query might look like this:

SELECT aka, region, language
FROM title
  JOIN aka USING (tid)
WHERE title = 'Moana'
  AND type = 'movie'
ORDER BY region
LIMIT 10

Exercise

  • Add an ORDER BY to the previous queries you wrote.
  • Add a LIMIT to the previous queries you wrote.

Example #4: Counting and grouping

Note: Lines beginning with two dashes are comments.

-- how many titles in the database?
SELECT count(*)
FROM title

SQL is very useful for analyzing data. For example:

-- how many titles for each type?
SELECT type, count(*)
FROM title
GROUP BY type

Notice the difference between the previous two queries. The first query counts all rows in a table. The second query puts the rows into groups, based on type, and then counts how many are in each group.

Exercise

  • How many aka rows are there for Moana?
  • How many titles are there for Dwayne Johnson?
  • How many aka rows are there for each movie?
  • How many titles are there for each actor/actress?

Part 5: Insert, Update, Delete

You can also use SQL to make changes in a database. The three basic operations are INSERT, UPDATE, and DELETE.

Example #1: Add a title

An INSERT statement adds a new row at the end of a table:

INSERT INTO table
VALUES (value1, value2, ...)

For example, to add a new movie that (for some reason?) is not in IMDb:

INSERT INTO title
VALUES (374, 'movie', 'JMU Beat Virginia Tech', 'JMU Beat Virginia Tech',
        0, 2010, NULL, 90, 'Action,Documentary')

Note: For now, you will need to make up tid values (like 374). You can see what values are available using a query like this:

SELECT min(tid), max(tid)
FROM title

Exercise

  • Add yourself to the database. Choose any profession you would like!
  • Add yourself to an existing title of your choice (via castcrew table).
  • Add a rating for any title that doesn't yet have a rating in the database.

Example #2: Edit a title

An UPDATE statement applies one or more assignments on a set of rows:

UPDATE table
SET column1 = expression1,
    column2 = expression2,
    ...
WHERE conditions

For example, to edit the movie inserted in the previous section:

UPDATE title
SET title = 'JMU Beat Virginia Tech Again',
    year = 2024
WHERE tid = 374

Note: The WHERE clause is important! If an UPDATE statement has no WHERE clause, every row will be updated.

Exercise

  • Edit the title (or at least change the spelling) of a famous movie of your choice.
  • Add 1 to the number of votes for all ratings with an average score of 5 or higher.
  • Update your own person record with titles that you would like to be known for.

Example #3: Remove a title

A DELETE statement removes a set of rows from the database:

DELETE FROM table
WHERE conditions

For example, to delete all titles from the year 2020:

DELETE FROM title
WHERE year = 2020

Note: The WHERE clause is important! If a DELETE statement has no WHERE clause, every row will be deleted.

Exercise

  • Remove an actor/actress that you don't like from the database.
  • Remove all ratings with under 100 votes and an average over 5.
  • Remove titles from the year 2020 with a runtime under 10 minutes.

Part 6: Querying CSV Files

SQLite is useful for running queries on other data. Many data files are in CSV format. CSV stands for Comma Separated Values. DB Browser can import and export CSV files.

  1. Visit the CSV Datasets page of the CORGIS project and download a dataset of your choice.

  2. In DB Browser, go to File → New In-Memory Database. Click Cancel when the Edit table definition popup appears.

  3. Go to File → Import → Table from CSV file… and open the file you downloaded. DB Browser will show a preview of the data and automatically create a table for the data.

  4. Click the Browse Data tab to verify the data imported correctly. Click the Execute SQL tab and write several queries for your data.

Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.

Part 7: Your Browser History

Many desktop applications and mobile apps use SQLite to store their data. For example, most web browsers use SQLite to store browsing history, cookies, bookmarks, etc. Let's take a look at what data your web browser has been collecting about you!

  1. Find the profile directory for your web browser. Common locations (relative to your home directory) include:

    • Chrome
      • Linux: .config/google-chrome/Default/
      • macOS: Library/Application Support/Google/Chrome/Default/
      • Windows: AppData\Local\Google\Chrome\User Data\Default\
    • Edge
      • Linux: .config/microsoft-edge/Default/
      • macOS: Library/Application Support/Microsoft Edge/Default/
      • Windows: AppData\Local\Microsoft\Edge\User Data\Default\
    • Firefox
      • Linux: .mozilla/firefox/{random_directory_name}/
      • macOS: Library/Application Support/Firefox/Profiles/{random_directory_name}/
      • Windows: AppData\Roaming\Mozilla\Firefox\Profiles\{random_directory_name}\
    • Safari
      • macOS: Library/Safari/
  2. Open one of the SQLite database files and examine the data.

    • You might need to close your web browser (application) if the database file is locked.
    • Use the Database Structure and Browse Data tabs to figure out what data is available.
  3. Write some queries over the data, for example:

    • Which websites do you visit most often?
    • How many unique websites do you visit?
    • Which websites store the most cookies?
    • What is the average cookie size in bytes?

Be prepared to discuss the queries with your teammates and the instructor. As time permits, we will discuss interesting examples as a whole class.

Going Further

There are many tutorials online for learning more about SQL. Here are a couple good ones:

See also Chapter 2. Relational Databases and Chapter 5. Complex Queries of the textbook. We will study SQL in more detail during the next several weeks of the course.