2015-03-24

Runtime Environments (Ch. 7)

x86 address space:
    ----------
    |  code  |
    | static |  (data & bss)
    | stack  |
    |   ...  |
    |  heap  |
    ----------

"control links" - base pointers
"access links" - normally resolved by compiler

stack management

  - on x86: "cdecl" calling conventions
  - callee must preserve bx, bp, sp, si
    - must save/restore if used
  - callee may use ax, cx, dx, flags, st0-7, mm0-7, xmm0-15
  - parameters may be passed in di, si, dx, cx, r8-15, etc.
  - return value saved in ax
  - each function builds a "stack frame" starting at base pointer (bp)

  - stack contents:
      --------------------------
      |          ...           |   (high address)
      |   local vars for main  |
      |      params to foo     |
      | return address in main |
      --------------------------
      |    saved bp for main   |
      |   local vars for foo   |
      |      params to bar     |
      | return address in foo  |
      --------------------------
      |    saved bp for foo    |   <- bp
      |    local vars for bar  |
      |          ...           |   <- sp

  - prologue:
      push %ebp             ; save old base pointer
      mov  %esp, %ebp       ; save top of stack as base pointer
      sub  X, %esp          ; reserve X bytes for local vars

  - within function:
      +OFFSET(%ebp)         ; function parameter
      -OFFSET(%ebp)         ; local variable

  - epilogue:
      leave                 ; mov %ebp, %esp
                            ; pop %ebp
      ret                   ; pop stack and jump to popped address

  - function calling:
      <save ax, cx, dx if necessary>
      <push parameters onto stack or save in registers>
      call <fname>

  - x86_64 "red zone" (128 bytes reserved below sp)
    - optimization: do not explicitly build frame (no sp manipulation)

  - callstack demo on wraith
    - objdump and objdump-func (custom script)

heap management

  - desired properties
    - space efficiency
    - exploitation of locality (time and space)
    - low overhead

  - allocation (malloc/new)
    - first-fit vs. best-fit vs. next-fit
    - coalescing free space (defragmentation)

  - manual deallocation (free/delete)
    - dangling pointers
    - memory leaks

  - automatic deallocation ("garbage collection")
    - criteria: overhead, pause time, space usage, locality impact
    - basic problem: finding reachable structures
      - root set: static and stack pointers
      - follow pointers through heap structures
    - basic solutions
      - reference counting
        - catch the transition to unreachable
        - has trouble with cyclic data structures
      - mark and sweep (tracing)
        - occasionally pause and detect reachable
        - high overhead and undesirable "pause the world" semantics
        - incremental collection: interleave computation and collection
        - partial collection: collect only a subset of memory on each run
        - generational collection: collect newer objects more often
        - collection can be parallelized to a certain extent