# Foundations ## A dive into processors' microarchitecture ### High-level abstraction * Highest level of abstraction: instructions -> CPU => modify internal state * instructions => ISA, asm * state: registers, memory hierarchy, (external effects) ### Microarchitecture * [big picture figure] * Roughly speaking: * frontend (decoder, renamer, other stuff). * backend (execution ports, execution units) * register file * caches * Instruction --[frontend]--> Mop, muop * muop --[backend port]--> retired [side effects] * vast majority of cases: execution units are fully pipelined * out of order CPUs: * Frontend in order up to some point * ROB * backend out-of-order * ROB: execution window. ILP limited to this window. * Dependencies handling * Dependencies are breaking the pipeline! * Renamer: helps up to a point * Hardware counters * SIMD ## Prerequisites on code analyzers * Code analyzers: given a program that is assumed to be the body of a hot loop, derive performance metrics and any information that might help towards performance debugging. * Usually static (vs dynamic); focus on static. * Very close to the machine: assembly, assembled bytes * Examples with llvm-mca * Resides in an object or executable file: ELF on Linux and most Unix-based platforms * Assembly: straight-line instructions, with (possibly conditional) jumps * Instruction identified by its program counter * Notion of basic block * Regions of interest: hottest basic blocks ## State of the art Throughput pred. : * Agner Fog * Uops.info * IACA * llvm-mca * Ithemal * PMEvo * OSACA * UiCA ## Maybe put this somewhere Backend models: * To predict the throughput of a kernel, a precise model of the CPU backend is required * Could be obtained from the manufacturer: ARM A72 optimization guide, Intel manual, … * but this is often incomplete, sometimes even wrong * Agner Fog * Uops.info