2015-04-23 Interprocedural Analysis (Ch. 12) local and regional ("global") optimizations are intraprocedural interprocedural analysis works across function boundaries - must be conservative; err on the sound side at the expense of completeness or precision one useful interprocedural optimization: code inlining - replacing a function call by the function body itself - may require modifications call graph: similar to control flow graph (function calls instead of branches) - nodes represent functions and call sites - edges represent a "may call" relationship - complicated by function pointers and virtual functions (in OOP) Example: Fig. 12.1 and 12.2 - 12.2(a): naive type-based analysis; fun1 and fun2 both match type of "pf" or: "pf" is set to fun1 and fun2, but never to main - 12.2(b): more precise, data-aware interprocedural analysis main issue for interprocedural analysis: context sensitivity context-insensitive analysis - create "super" control flow graph from regular CFG - edge from each call site to called function - edge from returns back to call sites - ignores relationship between input/output values - very imprecise - example in Fig. 12.3 and 12.4 context-sensitive analysis - incorporate calling context - basically a form of program modeling - solutions: - call strings (examples in Fig. 12.5 and 12.6) -- expensive! - can k-limit the length of strings (k=0 is context-insensitive analysis) - need to handle recursion and recursive cycles - even without recursion, can be exponential wrt. # of functions - function clones (examples in Fig. 12.7 and Fig. 12.6) - create a copy of the function for each call site "of interest" - or inline the clones - summaries (or sets of logical predicates) - backwards analysis to characterize function behavior - forwards analysis to propagate caller information - clone AFTER analysis based on the results Examples in Fig. 12.1-9: constant propagation applications of interprocedural analysis - virtual method invocation - inline most commonly-executed virtual methods and include runtime checks to divert to general case if necessary - pointer alias analysis - improve accuracy of dataflow analyses (even intraprocedural ones) - parallelization (at the program level) - concurrency issues - e.g., Locksmith from UMD (Pratikakis, TOPLAS'11) - SQL injection / taint analysis - tag data as "safe" or "unsafe" - generic version: type qualifiers (Foster, PLDI'99) - buffer overflow detection - prove that array writes are safe, or insert runtime checks recent development: link time optimization (LTO) in GCC - idea: leave object files in IR and apply whole-program optimization passes - effort started in 2005; initial implementation in 2009 (gcc 4.5.0) - interprocedural optimization (IPO) efforts in parallel starting in 2003 - scalable whole program optimizer (WHOPR) for multicore compilation - feedback-directed optimization (FDO) - LTO components: - new "middle-end" extension for saving IR to disk - save IR in extra ELF sections - new front-end for reading the IR - linker plugin for calling back to the new front-end - IPO components: - call graph data structures - interprocedural dataflow analysis - optimization pass manager - various interprocedural optimization passes - compile-time optimizations: - early passes - early inlining (very small functions) - constant propagation - inter-procedural scalar replacement (promote reference params to value) - alias analysis - full redundancy elimination via global value numbering - dead code elimination - tail recursion elimination - exception handling (remove empty handlers and regions that don't throw) - function attribute discovery (nothrow, noreturn, const, pure) - function splitting (head and tail, for partial inlining) - OpenMP offloading - send selected functions to a separate compiler for acceleration - link-time optimizations: - whole program visibility - remove unreachable functions and variables - write-only variable elimination - execution profile propagation (if provided) - optimize for speed: "hot" and "normal" - only optimize loops for speed: "execute-once" - optimize for size: "unlikely" - devirtualization - turn indirect calls into direct calls for polymorphic types - interprocedural constant propagation - constructor and destructor merging - function inlining (can make better global decisions than early inliner) - limited by "badness" score (code size growth, growth/benefit) - always inline "execute-once" functions if possible - function attribute propagation - function reordering (emit in likely-call order) - fewer pages need to be loaded in a typical execution - work still to be done - merging of C++ header functions - extension of IR to facilitate optimizations - reduce memory requirements (4GB for Firefox and kernel, 10GB for Chromium) - emitting useful debug info in the presence of high optimization