2015-03-24 Runtime Environments (Ch. 7) x86 address space: ---------- | code | | static | (data & bss) | stack | | ... | | heap | ---------- "control links" - base pointers "access links" - normally resolved by compiler stack management - on x86: "cdecl" calling conventions - callee must preserve bx, bp, sp, si - must save/restore if used - callee may use ax, cx, dx, flags, st0-7, mm0-7, xmm0-15 - parameters may be passed in di, si, dx, cx, r8-15, etc. - return value saved in ax - each function builds a "stack frame" starting at base pointer (bp) - stack contents: -------------------------- | ... | (high address) | local vars for main | | params to foo | | return address in main | -------------------------- | saved bp for main | | local vars for foo | | params to bar | | return address in foo | -------------------------- | saved bp for foo | <- bp | local vars for bar | | ... | <- sp - prologue: push %ebp ; save old base pointer mov %esp, %ebp ; save top of stack as base pointer sub X, %esp ; reserve X bytes for local vars - within function: +OFFSET(%ebp) ; function parameter -OFFSET(%ebp) ; local variable - epilogue: leave ; mov %ebp, %esp ; pop %ebp ret ; pop stack and jump to popped address - function calling: call - x86_64 "red zone" (128 bytes reserved below sp) - optimization: do not explicitly build frame (no sp manipulation) - callstack demo on wraith - objdump and objdump-func (custom script) heap management - desired properties - space efficiency - exploitation of locality (time and space) - low overhead - allocation (malloc/new) - first-fit vs. best-fit vs. next-fit - coalescing free space (defragmentation) - manual deallocation (free/delete) - dangling pointers - memory leaks - automatic deallocation ("garbage collection") - criteria: overhead, pause time, space usage, locality impact - basic problem: finding reachable structures - root set: static and stack pointers - follow pointers through heap structures - basic solutions - reference counting - catch the transition to unreachable - has trouble with cyclic data structures - mark and sweep (tracing) - occasionally pause and detect reachable - high overhead and undesirable "pause the world" semantics - incremental collection: interleave computation and collection - partial collection: collect only a subset of memory on each run - generational collection: collect newer objects more often - collection can be parallelized to a certain extent