diff --git a/plan/20_foundations.md b/plan/20_foundations.md index a293a23..7958149 100644 --- a/plan/20_foundations.md +++ b/plan/20_foundations.md @@ -1,11 +1,52 @@ -# State of the Art +# Foundations -En vrac : +## A dive into processors' microarchitecture -Outils utilisés : -* Pipedream -* Valgrind -* QEmu +### High-level abstraction + +* Highest level of abstraction: instructions -> CPU => modify internal state +* instructions => ISA, asm +* state: registers, memory hierarchy, (external effects) + +### Microarchitecture + +* [big picture figure] +* Roughly speaking: + * frontend (decoder, renamer, other stuff). + * backend (execution ports, execution units) + * register file + * caches +* Instruction --[frontend]--> Mop, muop +* muop --[backend port]--> retired [side effects] +* vast majority of cases: execution units are fully pipelined + * Dependencies are breaking the pipeline! + * Renamer: helps up to a point +* out of order CPUs: + * Frontend in order up to some point + * ROB + * backend out-of-order + * ROB: execution window. ILP limited to this window. + +* Hardware counters + +* SIMD + +## Prerequisites on code analyzers + +* Code analyzers: given a program that is assumed to be the + body of a hot loop, derive performance metrics and any information that might + help towards performance debugging. +* Usually static (vs dynamic); focus on static. +* Very close to the machine: assembly, assembled bytes +* Examples with llvm-mca +* Resides in an object or executable file: ELF on Linux and most Unix-based + platforms +* Assembly: straight-line instructions, with (possibly conditional) jumps +* Instruction identified by its program counter +* Notion of basic block + * Regions of interest: hottest basic blocks + +## State of the art Throughput pred. : * Agner Fog @@ -17,10 +58,7 @@ Throughput pred. : * OSACA * UiCA -Benchmark suites: -* Polybench -* SPEC - +## Maybe put this somewhere Backend models: * To predict the throughput of a kernel, a precise model of the CPU backend is required @@ -29,4 +67,3 @@ Backend models: * but this is often incomplete, sometimes even wrong * Agner Fog * Uops.info - diff --git a/plan/to_introduce_early.md b/plan/to_introduce_early.md index 3d356d9..299f836 100644 --- a/plan/to_introduce_early.md +++ b/plan/to_introduce_early.md @@ -1,12 +1,10 @@ # Stuff that must be introduced early (intro/foundations) -* Static vs. dynamic -* PC -* ELF +## Intro to CPUs + * ISA * Assembly * SIMD -* Basic block * μarch: * frontend * ports @@ -18,6 +16,18 @@ * ROB * L1-residence * HW counters + +## Foundations on code analyzers + +* Define Cycles(K): retired instructions +* Define notion of bottleneck +* Static vs. dynamic +* PC +* ELF +* Basic block + +## State of the art + * Tools: * IACA * llvm-mca @@ -25,6 +35,3 @@ * uops.info * UiCA * PMEvo - -* Define Cycles(K): retired instructions -* Define notion of bottleneck