2015-05-16 List Scheduling (10.1-10.3) focus of this section: instruction-level parallelism hardware support for instruction pipelining: Fig. 10.1 - many instructions can be executing in parallel - branches are problematic: which instruction(s) to execute next? - branch prediction heuristics attempt to predict this compiler goal: arrange instructions to take advantage of instruction pipelining and parallel execution WARNING: it is possible for the compiler to *hinder* this process, so care should be taken to avoid over-optimizing or interfering with hardware parallelization context - non-numeric programs: many data-dependent branches research focus: relaxing constraints to enable parallelism - numeric programs: many operations on large data structures research focus: scheduling maximum # of instructions in parallel tradeoff between register allocation and instruction scheduling - fewer registers => more sequential code (less parallelism) - Example: Fig. 10.2, 10.3, 10.4 - care should be taken when ordering these operations during compilation - may wish to expose options to the user for code-specific optimization data dependencies - "true dependence": read after write x = _; _ = x - "antidependence": write after read _ = x; x = _ - "output dependence": write after write x = _; x = _ - (last two can potentially be eliminated) - pointers, aliasing, and arrays can all complicate this! - Example: Exercise 10.2.1 complication of dependencies between basic blocks: control dependence - can sometimes execute control-dependent code speculatively (Ex. 10.4) list scheduling - machine model: resources (CPU, memory, etc.) and instructions - build data-dependence graph for instructions - edges denote data dependencies, and their weights indicate the minimum number of clock cycles of delay between start times - calculate topological order - height-based (minimize critical path) - resource-based (maximize resource efficiency) - source-based (minimize reordering) - we will use all three (use lower orderings to break ties) - build schedule for each instruction n: find earliest start time s for n, given predecessor delays while (too few resources available at s) s = s + 1 schedule n at s mark n's resources as allocated at step s - Exercise 10.3.2 for Fig. 10.10(a) (repeat with two MEM resources)