2015-04-21 Global Code Scheduling and Motion (Ch. 10.4) local list scheduling doesn't by itself save that many cycles - need a more "global" view (between basic blocks) global scheduling/optimization - move instructions between blocks to enable more parallelism - ensure all original instructions are executed with original semantics - (possibly) insert new instructions if necessary to preserve semantics - ensure that extra instructions do not have unwanted side effects domination - block A "dominates" block B if every path from entry to B goes through A - block B "postdominates" block A if every path from A to exit goes through B - block A is "control equivalent" to block B if A dominates B and B postdominates A Ex. 10.9 code motion - upward vs. downward - control equivalence: no problems - in some cases, simply insert copies - in other cases, it is illegal or expensive - must update data dependencies basic global scheduler (Fig 10.15) - schedule blocks and any control-equivalent blocks or dominated successors loop unrolling - avoids some loop condition and jump overhead - adds more opportunities for parallel scheduling - complete unrolling is impossible unless the # of iterations is known loop tiling/blocking (Fig. 11.5 and 11.8) - improves cache behavior and data locality general term: loop nest optimization dynamic schedulers or just-in-time (JIT) compilation - can take advantage of known information at runtime - similar to ALL* parsing Amdahl's Law: speedup is limited by how much of the code is parallelizable speedup = 1 / ((1-f) + (f/p)) f = parallelizable % of code p = number of processors scaling: weak vs. strong scaling