2015-03-26 Code Generation (Ch. 8) int. code gen: AST -> IR optimizer: IR -> more efficient IR code gen: IR -> target code code generator requirements: - must preserve semantics - should produce efficient code - should run efficiently generating the most optimal code is undecidable - unlike front-end transformations (e.g., lexing & parsing) - must use heuristics and approximation algorithms - this is why most research since 1960s has been on the back end code gen tasks: - instruction selection - register allocation/assignment - instruction ordering input to code gen - "relatively low-level IR" - values of IR names can be easily manipulated by the target architecture - assume all type/error checking has been completed output of code gen - stack-based code (e.g., Forth VM, Java VM) - RISC code (e.g., Alpha, MIPS, Power, ARM) - CISC code (e.g., PDP-11, VAX, Intel 8080, x86, x86-64) - often in subprogram modules (e.g., object files) instruction selection - map IR to target instructions - difficulty is directly related to uniformity and completeness of target instruction set - needs instruction cost information for optimal selection (often unavailable) register allocation/assignment - allocation: selecting which variables to store in registers - assignment: selecting which register to use for each variable - general optimization problem is NP-complete - register size: (al, ah, ax, eax, rax) simple target model - load/store ("mov" on x86) - computation ("add", "sub", etc. on x86) - unconditional jumps ("jmp" on x86) - conditional jumps ("test" and "je/jz/jg/jge/etc." on x86) - addressing modes: - immediates (IMMxx on x86) - registers (REG on x86) - named address (MEM on x86) - indexed address (SIB on x86) EXAMPLE: x = y - z mov y, %rcx mov z, %rax sub %rax, %rcx mov %rcx, x addresses: - code (program labels; size determined by compiler) - static (data & bss; size determined by compiler) - stack (managed by code emitted by compiler; size unknown while compiling) - heap (may be managed by runtime system; size unknown while compiling) fundamental concept: basic block - single entry point - multiple exit points - no branching inside block - "leader": first instruction in a block - blocks and edges form a flow graph - predecessors and successors - partitioned from TAC by finding leaders (Fig. 8.7 and 8.8) - first instruction is a leader - any jump target is a leader - any instruction following a jump is a leader fundamental concept: liveness analysis - a "live" variable may be used in the future - to calculate liveness, scan and propogate info backwards (Alg. 8.7) - start with only outputs marked as live - at each instruction, mark writes as live and then reads as live - example: x = y + z mark x not live mark y,z live examples: {b, c, d} a = b + c {a, c, d} b = a - d {a, b, c} c = b + c {a, b} d = a - b {} a = b + c {b, c, d} b = b - d {b, c, d} c = c + d {b, c} e = b + c {} basic block optimization - eliminate common subexpressions - "value-number" method from 6.1.1 and 6.1.2 - Ex. 8.10 and 8.11 a = b + c a = b + c e = c + d + b e = a + d - eliminate dead code - if computed variables are not live, there is no need to compute them - algebraic identities - apply identities to enable the use of "cheaper" operations - reorder non-dependent statements and operands caveat: aliasing - array accesses (8.5.5) - a[i] and a[j] could be aliases to the same array location - thus, can't apply standard common subexpression methods - array assignments kill other nodes depending on the array - pointer assignments (8.5.6) - x = *p is a potential use of EVERY VARIABLE in the program! - must kill all other nodes to be safe - can do some alias analysis to kill fewer nodes, but this can be complex - intra-procedural vs inter-procedural Exercise 8.5.1 and 8.5.3 Simple code generator (8.6) instructions - LD r1, mem r1 = mem - ST mem, r1 mem = r1 - OP r1, r2, r3 r1 = r2 r3 register data structures - register descriptor - NAMES(R): names associated with register R's value - address descriptor - LOC(x): locations storing the current value of variable x - could be registers, stack locations, or heap addresses for TAC: x = y z - call getReg(insn) to select registers Rx, Ry, Rz - if not y ∈ NAMES(Ry) - emit "LD Ry, &y" where &y is an address in LOC(y) - if not z ∈ NAMES(Rz) - emit "LD Rz, &z" where &z is an address in LOC(z) - emit "OP Rx, Ry, Rz" for TAC: x = y - call getReg(insn) to get a single register R - if not y ∈ NAMES(R), emit "LD R, y" - add x to NAMES(R) - set LOC(x) = { R } at the end of a basic block - emit "ST x, Rx" for each var x where x ∈ NAMES(Rx) and not &x ∈ LOC(x) rules for updating NAMES and LOC: - LD R, x set NAMES(R) = { x } add R to LOC(x) - ST x, R add &x to LOC(x) - OP Rx, Ry, Rz set NAMES(Rx) = { x } set LOC(x) = { Rx } // doesn't include &x !!! ∀v remove Rx from LOC(v) getReg(I) - choose registers for y and z - if y ∈ NAMES(R), Ry = R - if y is not in a register and register R is free, Ry = R - otherwise, see if we can safely "steal" a register - if we cannot, choose a register to "spill" - choose register for x - similar, but slightly more lenient (can use Ry or Rz if it is safe) - also, if I is a copy instruction, Rx = Ry Ex. 8.16 (Fig. 8.16) x86 notes - both LD and ST are implemented using the "mov" instruction - operations may use one memory operand peephole optimization - small, sliding window used for pattern matching - local transformations: - eliminating redundant loads and stores - eliminating dead code (e.g., debugging code) - eliminating chained jumps - algebraic simplification - use of machine-specific idioms (e.g., auto-inc/dec, fused add/multiply) register allocation (between basic blocks) - usage counts - graph coloring more optimal instruction selection: tree-based translation optimal expression code: Ershov (Sethi-Ullman?) numbers