Introduction to SQLAlchemy

CS 374, Fall 2025

What Is an ORM?

  • ORM stands for Object-Relational Mapper.

    • Bridges the gap between Python objects and database tables.
    • Converts between Python data types and SQL data types.
  • Allows developers to:

    • Query the database using Python syntax.
    • Keep code more portable and maintainable.
    • Avoid writing SQL for common operations.

Why Use an ORM?

Without ORM:

cursor.execute("SELECT * FROM customer WHERE name LIKE '%Smith%';")
  • Manual string concatenation
  • Schema mismatches

With ORM:

session.execute(select(Customer).where(Customer.name.like('%Smith%')))
  • Type-safe and composable
  • Automatically generates SQL

What Is SQLAlchemy?

  • Python’s leading ORM for relational databases.

  • Provides two layers:

    • Core: SQL Expression Language (explicit SQL control)
    • ORM: Object-relational abstraction (Pythonic classes)
  • Works with PostgreSQL, SQLite, MySQL, and others.

  • Enables cleaner, maintainable, and testable code.

Getting Started

  • Installation:

    pip install SQLAlchemy
    
  • Version check:

    >>> import sqlalchemy
    >>> sqlalchemy.__version__
    

Exercise 1: ORM Quick Start

https://docs.sqlalchemy.org/en/20/orm/quickstart.html


Create a SQLite database with User and Address tables

Example Model

class User(Base):
    __tablename__ = "user_account"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    fullname: Mapped[Optional[str]]

    addresses: Mapped[List["Address"]] = relationship(
        back_populates="user", cascade="all, delete-orphan"
    )

    def __repr__(self) -> str:
        return f"User(id={self.id!r}, name={self.name!r}, fullname={self.fullname!r})"
  • Class variables are mapped to database columns
  • Type hints (Ex: Mapped[int]) are used extensively
  • Relationships (Ex: addresses) are attributes too!

Example Engine

  • An Engine is a factory that can create new database connections.
  • Holds onto connections inside of a connection pool for fast reuse.
  • echo=True indicates that SQL will be logged to standard out.

Connect to SQLite (in-memory DB):

from sqlalchemy import create_engine
engine = create_engine("sqlite://", echo=True)

Connect to PostgreSQL:

# dialect+driver://username:password@host:port/database_name
DB_URL = "postgresql+psycopg://demo:demo@data.cs.jmu.edu/tpch"
engine = create_engine(DB_URL)

Object-Oriented

sandy = User(
    name="sandy",
    fullname="Sandy Cheeks",
    addresses=[
        Address(email_address="sandy@sqlalchemy.org"),
        Address(email_address="sandy@squirrelpower.org"),
    ],
)
  • Corresponding INSERT statements are automatically generated.

Queries in Python

stmt = (
    select(Address)
    .join(Address.user)
    .where(User.name == "sandy")
    .where(Address.email_address == "sandy@sqlalchemy.org")
)
sandy_address = session.scalars(stmt).one()
  • Method calls are chained together to formulate the query.

Updates in Python

stmt = select(User).where(User.name == "patrick")
patrick = session.scalars(stmt).one()

patrick.addresses.append(Address(email_address="patrickstar@sqlalchemy.org"))
sandy_address.email_address = "sandy_cheeks@sqlalchemy.org"

session.commit()
  • Corresponding UPDATE statements are automatically generated.

Exercise 2: Models for TPC-H

https://github.com/agronholm/sqlacodegen


Using sqlacodegen to reverse engineer an existing database.

What is sqlacodegen?

  • Automatic model code generator for SQLAlchemy

    • We'll use this as a starting point for your project!
  • Installation:

    pip install sqlacodegen
    
  • Usage example:

    sqlacodegen postgresql+psycopg://demo:demo@localhost/tpch > models.py
    

Manual Cleanup

  • Modify capitalization

    • Ex: LineItem and PartSupp
  • Rename attributes to be singular/plural

    • Ex: nations instead of nation
  • Reorder the class definitions

    • Not required, but more readable

Example Query

with Session(engine) as session:
    stmt = (
        select(
            Customer.c_name,
            func.count(distinct(LineItem.l_partkey)).label("num_parts")
        )
        .join(Order, Customer.c_custkey == Order.o_custkey)
        .join(LineItem, Order.o_orderkey == LineItem.l_orderkey)
        .group_by(Customer.c_name)
        .having(func.count(distinct(LineItem.l_partkey)) > 10)
        .order_by("num_parts")
        .limit(10)
    )

    # Execute the query and iterate the results
    for name, num_parts in session.execute(stmt):
        print(f"{name:<20} ordered {num_parts} distinct parts")

Further Reading

Core vs ORM

Feature SQLAlchemy Core SQLAlchemy ORM
Approach SQL Expression Language, programmatic SQL Object-Relational Mapping, object-centric
Control High-level control over generated SQL Higher-level abstraction, less direct SQL control
Performance Generally better for bulk/complex operations Overhead for object tracking, potentially slower for bulk
Complexity Can be more verbose for simple CRUD Simplifies CRUD, can be complex for advanced mapping
Use Cases Raw SQL needs, complex queries, bulk operations Object-oriented applications, relationship management, simplified CRUD