Introduction: first full version

This commit is contained in:
Théophile Bastian 2023-10-15 18:20:36 +02:00
parent 9dd4432342
commit b42e2c3d80
5 changed files with 137 additions and 4 deletions
manuscrit
10_introduction
60_staticdeps
99_conclusion
biblio
include

View file

@ -1,4 +1,5 @@
\chapter{Introduction}\label{chap:intro}
\chapter*{Introduction}\label{chap:intro}
\addcontentsline{toc}{chapter}{Introduction}
Developing new features and fixing problems are often regarded as the major
parts of the development cycle of a program. However, performance optimization
@ -59,7 +60,7 @@ CPUs. In many cases, transformations targeting a specific microarchitecture can
be very beneficial.
For instance, Uday Bondhugula found out that manual tuning, through many
techniques and tools, of a general matrix multiplication could multiply its
throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even 130 times
throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even be 130 times
faster than \texttt{clang -O3}~\cite{dgemm_finetune}.
This kind of optimizations, however, requires manual effort, and a
deep expert knowledge both in optimization techniques and on the specific
@ -95,3 +96,101 @@ counters sampling, may not always be precise and faithful. They can, however,
inspect at will their inner model state, and derive more advanced metrics or
hypotheses, for instance by predicting which resource might be overloaded and
slow the whole computation.
\vspace{2em}
In this thesis, we explore the three major aspects that work towards a code
analyzers' accuracy: a \emph{backend model}, a \emph{frontend model} and a
\emph{dependencies model}. We propose contributions to strengthen them, as well
as to automate the underlying models' synthesis. We focus on \emph{static}
code analyzers, that derive metrics, including runtime predictions, from an
assembly code or assembled binary.
The \hyperref[chap:foundations]{first chapter} introduces the foundations
of this manuscript, describing the microarchitectural notions on which our
analyses will be based, and exploring the current state of the art.
The \autoref{chap:palmed} introduces \palmed{}, a benchmarks-based tool
automatically synthesizing a model of a CPU's backend. Although the
theoretical core of \palmed{} is not my own work, I made major contributions to
other aspects of the tool. The chapter also presents the foundations and
methodologies \palmed{} shares with the following parts.
In \autoref{chap:frontend}, we explore the frontend aspects of static code
analyzers. This chapter focuses on the manual study of the Cortex A72
processor, and proposes a static model of its frontend. We finally reflect on
the generalization of our manual approach into an automated frontend modelling
tool, akin to \palmed.
Chapter~\ref{chap:CesASMe} makes an extensive study of the state-of-the-art
code analyzers' strengths and shortcomings. To this end, we introduce a
fully-tooled approach in two parts: first, a benchmarks-generation procedure,
yielding thousands of benchmarks relevant in the context of our approach; then,
a benchmarking harness evaluating code analyzers on these benchmarks. We find
that most state-of-the-art code analyzers struggle to correctly account for
some types of data dependencies.
Further building on our findings, \autoref{chap:staticdeps} introduces
\staticdeps{}, an accurate heuristic-based tool to statically extract data
dependencies from an assembly computation kernel. We extend \uica{}, a
state-of-the-art code analyzer, with \staticdeps{} predictions, and evaluate
the enhancement of its accuracy.
\bigskip{}
Throughout this manuscript, we explore notions that are transversal to the
hardware blocks the chapters lay out.
\medskip{}
Most of our approaches work towards an \emph{automated,
microarchitecture-independent} tooling. While fine-grained, accurate code
analysis is directly concerned with the underlying hardware and its specific
implementation, we strive to write tooling that has the least dependency
towards vendor-specific interfaces. In practice, this rules out most uses of
hardware counters, which depend greatly on the manufacturer, or even the
specific chip considered. As some CPUs expose only very bare hardware counters,
we see this commitment as an opportunity to develop methodologies able to model
these processors.
This is particularly true of \palmed, in \autoref{chap:palmed}, whose goal is
to model a processor's backend resources without resorting to its hardware
counters. Our frontend study, in \autoref{chap:frontend}, also follows this
strategy by focusing on a processor whose hardware counters give little to no
insight on its frontend. While this goal is less relevant to \staticdeps{}, we
rely on external libraries to abstract the underlying architecture.
\medskip{}
Our methodologies are, whenever relevant, \emph{benchmarks- and
experiments-driven}, in a bottom-up style, placing real hardware at the center.
In this spirit, \palmed{} is based solely on benchmarks, discarding entirely
the manufacturer's documentation. Our model of the Cortex A72 frontend is based
both on measures and documentation, yet it strives to be a case study from
which future works can generalize, to automatically synthesize frontend models
in a benchmarks-based fashion. One of the goals of our survey of the state of
the art, in \autoref{chap:CesASMe}, is to identify through experiments the
shortcomings that are most crucial to address in order to strengthen static
code analyzers.
\medskip{}
Finally, against the extent of the ecological and climatic crises we are
facing, as assessed among others by the IPCC~\cite{ipcc_ar6_syr}, we believe
that every field and discipline should strive for a positive impact or, at
the very least, to reduce as much as possible its negative impact. Our very
modest contribution to this end, throughout this thesis, is to commit
ourselves to computations as \emph{frugal} as possible: run computation-heavy
experiments as least as possible; avoid running multiple times the same
experiment, but cache results instead when this is feasible; etc. This
commitment partly motivated us to implement a results database in \palmed{}, to
compute only once each benchmark. As our experiments in
\autoref{chap:CesASMe} take many hours to yield a result, we at least evaluate
their carbon impact.
We believe it noteworthy, however, to point out that although this thesis is
concerned with tools that help optimize large computation workloads,
\emph{optimization does not lead to frugality}. In most cases, Jevons paradox
---~also called rebound effect~--- makes it instead
more likely to lead to an increased absolute usage of computational
resources~\cite{jevons_coal_question,understanding_jevons_paradox}.

View file

@ -1,4 +1,5 @@
\chapter{Static extraction of memory-carried dependencies}
\chapter{Static extraction of memory-carried
dependencies}\label{chap:staticdeps}
\input{00_intro.tex}
\input{10_types_of_deps.tex}

View file

@ -1 +1,2 @@
\chapter{Conclusion}
\chapter*{Conclusion}
\addcontentsline{toc}{chapter}{Conclusion}

View file

@ -34,3 +34,33 @@
series = {ISCA '22}
}
@book{ipcc_ar6_syr,
title = {IPCC, 2023: Climate Change 2023: Synthesis Report},
author = {{Contribution of Working Groups I, II and III to the Sixth
Assessment Report of the Intergovernmental Panel on Climate
Change [Core Writing Team, H. Lee and J. Romero (eds.)]}},
editor = {{IPCC, Geneva, Switzerland}},
year = 2023,
note = {doi: 10.59327/IPCC/AR6-9789291691647},
}
@book{jevons_coal_question,
title={The coal question; an inquiry concerning the progress of the
nation and the probable exhaustion of our coal-mines},
author={Jevons, William Stanley},
year={1866},
publisher={Macmillan}
}
@article{understanding_jevons_paradox,
author = {Richard York and Julius Alexander McGee},
title = {Understanding the Jevons paradox},
journal = {Environmental Sociology},
volume = {2},
number = {1},
pages = {77-87},
year = {2016},
publisher = {Routledge},
doi = {10.1080/23251042.2015.1106060},
}

View file

@ -90,3 +90,5 @@
\newfloat{algorithm}{htbp}{lop}
\floatname{algorithm}{Algorithm}
\def\algorithmautorefname{Algorithm}
\def\chapterautorefname{Chapter}