1.9 KiB
1.9 KiB
Foundations
A dive into processors' microarchitecture
High-level abstraction
- Highest level of abstraction: instructions -> CPU => modify internal state
- instructions => ISA, asm
- state: registers, memory hierarchy, (external effects)
Microarchitecture
-
[big picture figure]
-
Roughly speaking:
- frontend (decoder, renamer, other stuff).
- backend (execution ports, execution units)
- register file
- caches
-
Instruction --[frontend]--> Mop, muop
-
muop --[backend port]--> retired [side effects]
-
vast majority of cases: execution units are fully pipelined
-
out of order CPUs:
- Frontend in order up to some point
- ROB
- backend out-of-order
- ROB: execution window. ILP limited to this window.
-
Dependencies handling
- Dependencies are breaking the pipeline!
- Renamer: helps up to a point
-
Hardware counters
-
SIMD
Prerequisites on code analyzers
- Code analyzers: given a program that is assumed to be the body of a hot loop, derive performance metrics and any information that might help towards performance debugging.
- Usually static (vs dynamic); focus on static.
- Very close to the machine: assembly, assembled bytes
- Examples with llvm-mca
- Resides in an object or executable file: ELF on Linux and most Unix-based platforms
- Assembly: straight-line instructions, with (possibly conditional) jumps
- Instruction identified by its program counter
- Notion of basic block
- Regions of interest: hottest basic blocks
State of the art
Throughput pred. :
- Agner Fog
- Uops.info
- IACA
- llvm-mca
- Ithemal
- PMEvo
- OSACA
- UiCA
Maybe put this somewhere
Backend models:
- To predict the throughput of a kernel, a precise model of the CPU backend is required
- Could be obtained from the manufacturer: ARM A72 optimization guide, Intel
manual, …
- but this is often incomplete, sometimes even wrong
- Agner Fog
- Uops.info