2015-01-15
Class 2
Intro (Ch. 1)

administrivia
  - textbook (& errata)
  - access to server
  - Python: ANTLR, LEPL, yapps

compilers are *translators*
  - source -> target
  - basically a really complicated function
  - target can be a high-level language,
    low-level/assembly language,
    or machine code

compiler topics widely relevant in PL, SE, arch, theory, security
  - Mike Hicks @ UMD
    - LockSmith, safe C, language-enforced security policies
  - Jeff Foster @ UMD
    - CQual, type systems/qualifiers, proof generators
  - Nate Rosenblum @ UWisc
    - compiler provenance, forensics, PASTE'10 paper
  - LLVM (generalized compiler framework)

compiled vs. interpreted languages
  - everything is a translator! it's functions all the way down...

compilers are a sequence of functional phases
  - front end (analysis) vs. back end (synthesis)
    - allows for more interoperability
  - symbol table and other orthogonal info used by many phases
  - common phases: (fig. 1.7)
    - lexical analysis (scanning)
    - syntax analysis (parsing)
    - semantic analysis (checking)
    - code generator (intermediate or target)
    - optimizer (machine dependent or independent)

our Decaf compiler
  - source file   ->  scanner    ->  token stream
  - token stream  ->  parser     ->  abstract syntax tree (AST)
  - AST           ->  analyzer   ->  clean AST
  - clean AST     ->  code gen   ->  3-address code
  - 3-addr code   ->  optimizer  ->  x86 assembly
  - x86 assembly  ->  [[GNU as]] ->  x86 executable

compiler construction tools
  - we will use lexer and parser generators

models / data structures in compilers
  - regular expressions
  - finite automata (state machines)
  - context-free languages
  - push-down automata (parsers)
  - trees
  - graphs, matrices, linear programs

generating "optimal" code is undecidable
  - use decisions and heuristics to pick which parts are worth tackling
  - basically an optimization problem

key tension: abstraction vs. speed
  - more abstraction: higher read/writeability, cleaner complexity
  - less abstraction: closer to hardware, faster code
  - compilers must balance this tension

high-performance computing
  - explicit instruction-level parallelism (SIMD; e.g., SSE)
  - implicit instruction-level parallelism (hidden in hardware)
  - explicit program-level parallelism (OpenMP, threads, MPI, etc.)
  - implicit program-level parallelism (HARD!)
  - register allocation
  - memory layout and caching (interacts with all levels)

RISC vs. CISC
  - x86 won (at least for now)

compiler error detection
  - type checking
  - bounds checking
  - memory management
  - security properties
  - sound (reported errors are real - no false positives)
  - complete (finds all real errors - no false negatives)
  - no "real" compiler is either

PL history
  - 1st gen: machine codes
             macro instructions
  - 2nd gen: assembly languages
  - 3rd gen: "low-level" languages
             "high-level" languages
  - 4th gen: domain-specific languages

PL categories
  - imperative / procedural
  - functional / declarative
  - object-oriented
  - scripting
  - domain-specific

PL basics
  - static (compile time) vs. dynamic (run time)
    - scope: static (lexical) vs. dynamic (runtime or stack-based)
      - also public vs. private vs. protected
      - dynamic scope is also used in OO programming
    - environment (name -> location)
      - usually dynamic (static bindings are usually called "globals")
    - state (location -> value)
      - usually dynamic (static bindings are usually called "constants")
  - name vs. identifier (string of characters) vs. variable (location)
  - subprograms:
    - procedure (inlined code) vs. function (input->output) vs. method (OO)
    - first-class functions, lambdas, and closures
  - blocks (group of declarations and statements)
  - declaration (types and signatures) vs. definition (values and code)
  - parameter passing
    - formal (code/definition) vs. actual (passed during call)
    - call-by-value vs. call-by-reference vs. call-by-name
  - aliasing (two names refer to the same location)