70 lines
1.9 KiB
Markdown
70 lines
1.9 KiB
Markdown
# Foundations
|
|
|
|
## A dive into processors' microarchitecture
|
|
|
|
### High-level abstraction
|
|
|
|
* Highest level of abstraction: instructions -> CPU => modify internal state
|
|
* instructions => ISA, asm
|
|
* state: registers, memory hierarchy, (external effects)
|
|
|
|
### Microarchitecture
|
|
|
|
* [big picture figure]
|
|
* Roughly speaking:
|
|
* frontend (decoder, renamer, other stuff).
|
|
* backend (execution ports, execution units)
|
|
* register file
|
|
* caches
|
|
* Instruction --[frontend]--> Mop, muop
|
|
* muop --[backend port]--> retired [side effects]
|
|
* vast majority of cases: execution units are fully pipelined
|
|
* out of order CPUs:
|
|
* Frontend in order up to some point
|
|
* ROB
|
|
* backend out-of-order
|
|
* ROB: execution window. ILP limited to this window.
|
|
* Dependencies handling
|
|
* Dependencies are breaking the pipeline!
|
|
* Renamer: helps up to a point
|
|
|
|
* Hardware counters
|
|
|
|
* SIMD
|
|
|
|
## Prerequisites on code analyzers
|
|
|
|
* Code analyzers: given a program that is assumed to be the
|
|
body of a hot loop, derive performance metrics and any information that might
|
|
help towards performance debugging.
|
|
* Usually static (vs dynamic); focus on static.
|
|
* Very close to the machine: assembly, assembled bytes
|
|
* Examples with llvm-mca
|
|
* Resides in an object or executable file: ELF on Linux and most Unix-based
|
|
platforms
|
|
* Assembly: straight-line instructions, with (possibly conditional) jumps
|
|
* Instruction identified by its program counter
|
|
* Notion of basic block
|
|
* Regions of interest: hottest basic blocks
|
|
|
|
## State of the art
|
|
|
|
Throughput pred. :
|
|
* Agner Fog
|
|
* Uops.info
|
|
* IACA
|
|
* llvm-mca
|
|
* Ithemal
|
|
* PMEvo
|
|
* OSACA
|
|
* UiCA
|
|
|
|
## Maybe put this somewhere
|
|
Backend models:
|
|
* To predict the throughput of a kernel, a precise model of the CPU backend is
|
|
required
|
|
* Could be obtained from the manufacturer: ARM A72 optimization guide, Intel
|
|
manual, …
|
|
* but this is often incomplete, sometimes even wrong
|
|
* Agner Fog
|
|
* Uops.info
|