phd-thesis/plan/20_foundations.md

70 lines
1.9 KiB
Markdown

# Foundations
## A dive into processors' microarchitecture
### High-level abstraction
* Highest level of abstraction: instructions -> CPU => modify internal state
* instructions => ISA, asm
* state: registers, memory hierarchy, (external effects)
### Microarchitecture
* [big picture figure]
* Roughly speaking:
* frontend (decoder, renamer, other stuff).
* backend (execution ports, execution units)
* register file
* caches
* Instruction --[frontend]--> Mop, muop
* muop --[backend port]--> retired [side effects]
* vast majority of cases: execution units are fully pipelined
* out of order CPUs:
* Frontend in order up to some point
* ROB
* backend out-of-order
* ROB: execution window. ILP limited to this window.
* Dependencies handling
* Dependencies are breaking the pipeline!
* Renamer: helps up to a point
* Hardware counters
* SIMD
## Prerequisites on code analyzers
* Code analyzers: given a program that is assumed to be the
body of a hot loop, derive performance metrics and any information that might
help towards performance debugging.
* Usually static (vs dynamic); focus on static.
* Very close to the machine: assembly, assembled bytes
* Examples with llvm-mca
* Resides in an object or executable file: ELF on Linux and most Unix-based
platforms
* Assembly: straight-line instructions, with (possibly conditional) jumps
* Instruction identified by its program counter
* Notion of basic block
* Regions of interest: hottest basic blocks
## State of the art
Throughput pred. :
* Agner Fog
* Uops.info
* IACA
* llvm-mca
* Ithemal
* PMEvo
* OSACA
* UiCA
## Maybe put this somewhere
Backend models:
* To predict the throughput of a kernel, a precise model of the CPU backend is
required
* Could be obtained from the manufacturer: ARM A72 optimization guide, Intel
manual, …
* but this is often incomplete, sometimes even wrong
* Agner Fog
* Uops.info