2015-03-26

Code Generation (Ch. 8)

int. code gen: AST -> IR
optimizer:     IR -> more efficient IR
code gen:      IR -> target code

code generator requirements:
  - must preserve semantics
  - should produce efficient code
  - should run efficiently

generating the most optimal code is undecidable
  - unlike front-end transformations (e.g., lexing & parsing)
  - must use heuristics and approximation algorithms
  - this is why most research since 1960s has been on the back end

code gen tasks:
  - instruction selection
  - register allocation/assignment
  - instruction ordering

input to code gen
  - "relatively low-level IR"
  - values of IR names can be easily manipulated by the target architecture
  - assume all type/error checking has been completed

output of code gen
  - stack-based code (e.g., Forth VM, Java VM)
  - RISC code (e.g., Alpha, MIPS, Power, ARM)
  - CISC code (e.g., PDP-11, VAX, Intel 8080, x86, x86-64)
  - often in subprogram modules (e.g., object files)

instruction selection
  - map IR to target instructions
  - difficulty is directly related to uniformity and completeness of target
    instruction set
  - needs instruction cost information for optimal selection
    (often unavailable)

register allocation/assignment
  - allocation: selecting which variables to store in registers
  - assignment: selecting which register to use for each variable
  - general optimization problem is NP-complete
  - register size: (al, ah, ax, eax, rax)

simple target model
  - load/store          ("mov" on x86)
  - computation         ("add", "sub", etc. on x86)
  - unconditional jumps ("jmp" on x86)
  - conditional jumps   ("test" and "je/jz/jg/jge/etc." on x86)
  - addressing modes:
    - immediates        (IMMxx on x86)
    - registers         (REG on x86)
    - named address     (MEM on x86)
    - indexed address   (SIB on x86)

EXAMPLE: x = y - z
    mov y, %rcx
    mov z, %rax
    sub %rax, %rcx
    mov %rcx, x

addresses:
  - code (program labels; size determined by compiler)
  - static (data & bss; size determined by compiler)
  - stack (managed by code emitted by compiler; size unknown while compiling)
  - heap (may be managed by runtime system; size unknown while compiling)

fundamental concept: basic block
  - single entry point
  - multiple exit points
  - no branching inside block
  - "leader": first instruction in a block
  - blocks and edges form a flow graph
    - predecessors and successors
    - partitioned from TAC by finding leaders (Fig. 8.7 and 8.8)
      - first instruction is a leader
      - any jump target is a leader
      - any instruction following a jump is a leader

fundamental concept: liveness analysis
  - a "live" variable may be used in the future
  - to calculate liveness, scan and propogate info backwards (Alg. 8.7)
    - start with only outputs marked as live
    - at each instruction, mark writes as live and then reads as live
    - example: x = y + z      mark x not live
                              mark y,z live

examples:
                            {b, c, d}
            a = b + c       {a, c, d}
            b = a - d       {a, b, c}
            c = b + c       {a, b}
            d = a - b       {}

            a = b + c       {b, c, d}
            b = b - d       {b, c, d}
            c = c + d       {b, c}
            e = b + c       {}

basic block optimization

  - eliminate common subexpressions
    - "value-number" method from 6.1.1 and 6.1.2
    - Ex. 8.10 and 8.11
        a = b + c                   a = b + c
        e = c + d + b               e = a + d

  - eliminate dead code
    - if computed variables are not live, there is no need to compute them

  - algebraic identities
    - apply identities to enable the use of "cheaper" operations

  - reorder non-dependent statements and operands

caveat: aliasing
  - array accesses (8.5.5)
    - a[i] and a[j] could be aliases to the same array location
    - thus, can't apply standard common subexpression methods
    - array assignments kill other nodes depending on the array
  - pointer assignments (8.5.6)
    - x = *p is a potential use of EVERY VARIABLE in the program!
    - must kill all other nodes to be safe
    - can do some alias analysis to kill fewer nodes, but this can be complex
    - intra-procedural vs inter-procedural

Exercise 8.5.1 and 8.5.3

Simple code generator (8.6)

instructions
  - LD r1, mem              r1  = mem
  - ST mem, r1              mem = r1
  - OP r1, r2, r3           r1  = r2 <op> r3

register data structures
  - register descriptor
    - NAMES(R): names associated with register R's value
  - address descriptor
    - LOC(x): locations storing the current value of variable x
    - could be registers, stack locations, or heap addresses

for TAC: x = y <op> z
  - call getReg(insn) to select registers Rx, Ry, Rz
  - if not y ∈ NAMES(Ry)
    - emit "LD Ry, &y" where &y is an address in LOC(y)
  -  if not z ∈ NAMES(Rz)
    - emit "LD Rz, &z" where &z is an address in LOC(z)
  -  emit "OP Rx, Ry, Rz"

for TAC: x = y
  - call getReg(insn) to get a single register R
  - if not y ∈ NAMES(R), emit "LD R, y"
  - add x to NAMES(R)
  - set LOC(x) = { R }

at the end of a basic block
  - emit "ST x, Rx" for each var x where x ∈ NAMES(Rx) and not &x ∈ LOC(x)

rules for updating NAMES and LOC:

  - LD R, x         set NAMES(R) = { x }
                    add R to LOC(x)

  - ST x, R         add &x to LOC(x)

  - OP Rx, Ry, Rz   set NAMES(Rx) = { x }
                    set LOC(x)    = { Rx }      // doesn't include &x !!!
                    ∀v remove Rx from LOC(v)

getReg(I)
  - choose registers for y and z
    - if y ∈ NAMES(R), Ry = R
    - if y is not in a register and register R is free, Ry = R
    - otherwise, see if we can safely "steal" a register
    - if we cannot, choose a register to "spill"
  - choose register for x
    - similar, but slightly more lenient (can use Ry or Rz if it is safe)
    - also, if I is a copy instruction, Rx = Ry

Ex. 8.16 (Fig. 8.16)

x86 notes
  - both LD and ST are implemented using the "mov" instruction
  - operations may use one memory operand

peephole optimization
  - small, sliding window used for pattern matching
  - local transformations:
    - eliminating redundant loads and stores
    - eliminating dead code (e.g., debugging code)
    - eliminating chained jumps
    - algebraic simplification
    - use of machine-specific idioms (e.g., auto-inc/dec, fused add/multiply)

register allocation (between basic blocks)
  - usage counts
  - graph coloring

more optimal instruction selection: tree-based translation

optimal expression code: Ershov (Sethi-Ullman?) numbers