phd-thesis/manuscrit/meta/abstract-plain-en.txt

Be it massively distributed computation over multiple server racks, constrained computation — such as in embedded environments or in edge computing — or still an attempt to reduce the ecological footprint of a frequently-run program, many use-cases make it relevant to deeply optimize a program. This optimisation is often limited to high-level optimisation — choice of algorithms, parallel computing, … Yet, it is possible to carry it further to low-level optimisations, by inspecting the generated assembly with respect to the microarchitecture of the specific microprocessor used to fine-tune it.

Such an optimisation level requires a very detailed understanding of both the software and hardware aspects implied, and is most often the realm of experts. Code analyzers, however, are tools that help lowering the expertise threshold required to perform such optimisations by automating away a portion of the work required to understand the source of the encountered performance problems. The same tools are also useful to experts, as they help them be more efficient in their work.

In this manuscript, we study the main performance bottlenecks of a processor, on which the state of the art does not perform consistently. For each of these bottlenecks, we contribute to the state of the art. We work on automating the obtention of a model of the processor's backend; we manually study the processor's frontend, hoping to set a milestone towards the automation of the obtention of such models; we provide a tool to automatically extract a computation kernel's memory-carried dependencies. We also provide a systematic, automated and fully-tooled study of the prediction accuracy of various state-of-the-art code analyzers.