phd-thesis/plan/20_foundations.md

1.9 KiB

Foundations

A dive into processors' microarchitecture

High-level abstraction

  • Highest level of abstraction: instructions -> CPU => modify internal state
  • instructions => ISA, asm
  • state: registers, memory hierarchy, (external effects)

Microarchitecture

  • [big picture figure]

  • Roughly speaking:

    • frontend (decoder, renamer, other stuff).
    • backend (execution ports, execution units)
    • register file
    • caches
  • Instruction --[frontend]--> Mop, muop

  • muop --[backend port]--> retired [side effects]

  • vast majority of cases: execution units are fully pipelined

  • out of order CPUs:

    • Frontend in order up to some point
    • ROB
    • backend out-of-order
    • ROB: execution window. ILP limited to this window.
  • Dependencies handling

    • Dependencies are breaking the pipeline!
    • Renamer: helps up to a point
  • Hardware counters

  • SIMD

Prerequisites on code analyzers

  • Code analyzers: given a program that is assumed to be the body of a hot loop, derive performance metrics and any information that might help towards performance debugging.
  • Usually static (vs dynamic); focus on static.
  • Very close to the machine: assembly, assembled bytes
  • Examples with llvm-mca
  • Resides in an object or executable file: ELF on Linux and most Unix-based platforms
  • Assembly: straight-line instructions, with (possibly conditional) jumps
  • Instruction identified by its program counter
  • Notion of basic block
    • Regions of interest: hottest basic blocks

State of the art

Throughput pred. :

  • Agner Fog
  • Uops.info
  • IACA
  • llvm-mca
  • Ithemal
  • PMEvo
  • OSACA
  • UiCA

Maybe put this somewhere

Backend models:

  • To predict the throughput of a kernel, a precise model of the CPU backend is required
  • Could be obtained from the manufacturer: ARM A72 optimization guide, Intel manual, …
    • but this is often incomplete, sometimes even wrong
  • Agner Fog
  • Uops.info