# Foundations

## A dive into processors' microarchitecture

### High-level abstraction

* Highest level of abstraction: instructions -> CPU => modify internal state
* instructions => ISA, asm
* state: registers, memory hierarchy, (external effects)

### Microarchitecture

* [big picture figure]
* Roughly speaking:
    * frontend (decoder, renamer, other stuff).
    * backend (execution ports, execution units)
    * register file
    * caches
* Instruction --[frontend]--> Mop, muop
* muop --[backend port]--> retired [side effects]
* vast majority of cases: execution units are fully pipelined
* out of order CPUs:
    * Frontend in order up to some point
    * ROB
    * backend out-of-order
    * ROB: execution window. ILP limited to this window.
* Dependencies handling
    * Dependencies are breaking the pipeline!
    * Renamer: helps up to a point

* Hardware counters

* SIMD

## Prerequisites on code analyzers

* Code analyzers: given a program that is assumed to be the
  body of a hot loop, derive performance metrics and any information that might
  help towards performance debugging.
* Usually static (vs dynamic); focus on static.
* Very close to the machine: assembly, assembled bytes
* Examples with llvm-mca
* Resides in an object or executable file: ELF on Linux and most Unix-based
  platforms
* Assembly: straight-line instructions, with (possibly conditional) jumps
* Instruction identified by its program counter
* Notion of basic block
    * Regions of interest: hottest basic blocks

## State of the art

Throughput pred. :
* Agner Fog
* Uops.info
* IACA
* llvm-mca
* Ithemal
* PMEvo
* OSACA
* UiCA

## Maybe put this somewhere
Backend models:
* To predict the throughput of a kernel, a precise model of the CPU backend is
  required
* Could be obtained from the manufacturer: ARM A72 optimization guide, Intel
  manual, …
    * but this is often incomplete, sometimes even wrong
* Agner Fog
* Uops.info