Introduction: first full version

2023-10-15 18:20:36 +02:00 · 2023-10-15 18:20:36 +02:00 · b42e2c3d80
commit b42e2c3d80
parent 9dd4432342
5 changed files with 137 additions and 4 deletions
--- a/manuscrit/10_introduction/main.tex
+++ b/manuscrit/10_introduction/main.tex
@ -1,4 +1,5 @@
-\chapter{Introduction}\label{chap:intro}
+\chapter*{Introduction}\label{chap:intro}
+\addcontentsline{toc}{chapter}{Introduction}

 Developing new features and fixing problems are often regarded as the major
 parts of the development cycle of a program. However, performance optimization
@ -59,7 +60,7 @@ CPUs. In many cases, transformations targeting a specific microarchitecture can
 be very beneficial.
 For instance, Uday Bondhugula found out that manual tuning, through many
 techniques and tools, of a general matrix multiplication could multiply its
-throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even 130 times
+throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even be 130 times
 faster than \texttt{clang -O3}~\cite{dgemm_finetune}.
 This kind of optimizations, however, requires manual effort, and a
 deep expert knowledge both in optimization techniques and on the specific
@ -95,3 +96,101 @@ counters sampling, may not always be precise and faithful. They can, however,
 inspect at will their inner model state, and derive more advanced metrics or
 hypotheses, for instance by predicting which resource might be overloaded and
 slow the whole computation.
+
+\vspace{2em}
+
+In this thesis, we explore the three major aspects that work towards a code
+analyzers' accuracy: a \emph{backend model}, a \emph{frontend model} and a
+\emph{dependencies model}. We propose contributions to strengthen them, as well
+as to automate the underlying models' synthesis.  We focus on \emph{static}
+code analyzers, that derive metrics, including runtime predictions, from an
+assembly code or assembled binary.
+
+The \hyperref[chap:foundations]{first chapter} introduces the foundations
+of this manuscript, describing the microarchitectural notions on which our
+analyses will be based, and exploring the current state of the art.
+
+The \autoref{chap:palmed} introduces \palmed{}, a benchmarks-based tool
+automatically synthesizing a model of a CPU's backend.  Although the
+theoretical core of \palmed{} is not my own work, I made major contributions to
+other aspects of the tool. The chapter also presents the foundations and
+methodologies \palmed{} shares with the following parts.
+
+In \autoref{chap:frontend}, we explore the frontend aspects of static code
+analyzers. This chapter focuses on the manual study of the Cortex A72
+processor, and proposes a static model of its frontend. We finally reflect on
+the generalization of our manual approach into an automated frontend modelling
+tool, akin to \palmed.
+
+Chapter~\ref{chap:CesASMe} makes an extensive study of the state-of-the-art
+code analyzers' strengths and shortcomings. To this end, we introduce a
+fully-tooled approach in two parts: first, a benchmarks-generation procedure,
+yielding thousands of benchmarks relevant in the context of our approach; then,
+a benchmarking harness evaluating code analyzers on these benchmarks. We find
+that most state-of-the-art code analyzers struggle to correctly account for
+some types of data dependencies.
+
+Further building on our findings, \autoref{chap:staticdeps} introduces
+\staticdeps{}, an accurate heuristic-based tool to statically extract data
+dependencies from an assembly computation kernel. We extend \uica{}, a
+state-of-the-art code analyzer, with \staticdeps{} predictions, and evaluate
+the enhancement of its accuracy.
+
+\bigskip{}
+
+Throughout this manuscript, we explore notions that are transversal to the
+hardware blocks the chapters lay out.
+
+\medskip{}
+
+Most of our approaches work towards an \emph{automated,
+microarchitecture-independent} tooling. While fine-grained, accurate code
+analysis is directly concerned with the underlying hardware and its specific
+implementation, we strive to write tooling that has the least dependency
+towards vendor-specific interfaces. In practice, this rules out most uses of
+hardware counters, which depend greatly on the manufacturer, or even the
+specific chip considered. As some CPUs expose only very bare hardware counters,
+we see this commitment as an opportunity to develop methodologies able to model
+these processors.
+
+This is particularly true of \palmed, in \autoref{chap:palmed}, whose goal is
+to model a processor's backend resources without resorting to its hardware
+counters. Our frontend study, in \autoref{chap:frontend}, also follows this
+strategy by focusing on a processor whose hardware counters give little to no
+insight on its frontend. While this goal is less relevant to \staticdeps{}, we
+rely on external libraries to abstract the underlying architecture.
+
+\medskip{}
+
+Our methodologies are, whenever relevant, \emph{benchmarks- and
+experiments-driven}, in a bottom-up style, placing real hardware at the center.
+In this spirit, \palmed{} is based solely on benchmarks, discarding entirely
+the manufacturer's documentation. Our model of the Cortex A72 frontend is based
+both on measures and documentation, yet it strives to be a case study from
+which future works can generalize, to automatically synthesize frontend models
+in a benchmarks-based fashion. One of the goals of our survey of the state of
+the art, in \autoref{chap:CesASMe}, is to identify through experiments the
+shortcomings that are most crucial to address in order to strengthen static
+code analyzers.
+
+\medskip{}
+
+Finally, against the extent of the ecological and climatic crises we are
+facing, as assessed among others by the IPCC~\cite{ipcc_ar6_syr}, we believe
+that every field and discipline should strive for a positive impact or, at
+the very least, to reduce as much as possible its negative impact. Our very
+modest contribution to this end, throughout this thesis, is to commit
+ourselves to computations as \emph{frugal} as possible: run computation-heavy
+experiments as least as possible; avoid running multiple times the same
+experiment, but cache results instead when this is feasible; etc. This
+commitment partly motivated us to implement a results database in \palmed{}, to
+compute only once each benchmark. As our experiments in
+\autoref{chap:CesASMe} take many hours to yield a result, we at least evaluate
+their carbon impact.
+
+We believe it noteworthy, however, to point out that although this thesis is
+concerned with tools that help optimize large computation workloads,
+\emph{optimization does not lead to frugality}. In most cases, Jevons paradox
+---~also called rebound effect~--- makes it instead
+more likely to lead to an increased absolute usage of computational
+resources~\cite{jevons_coal_question,understanding_jevons_paradox}.
--- a/manuscrit/60_staticdeps/main.tex
+++ b/manuscrit/60_staticdeps/main.tex
@ -1,4 +1,5 @@
-\chapter{Static extraction of memory-carried dependencies}
+\chapter{Static extraction of memory-carried
+dependencies}\label{chap:staticdeps}

 \input{00_intro.tex}
 \input{10_types_of_deps.tex}
--- a/manuscrit/99_conclusion/main.tex
+++ b/manuscrit/99_conclusion/main.tex
@ -1 +1,2 @@
-\chapter{Conclusion}
+\chapter*{Conclusion}
+\addcontentsline{toc}{chapter}{Conclusion}
--- a/manuscrit/biblio/ecology.bib
+++ b/manuscrit/biblio/ecology.bib
@ -34,3 +34,33 @@
    series = {ISCA '22}
 }

+
+@book{ipcc_ar6_syr,
+    title = {IPCC, 2023: Climate Change 2023: Synthesis Report},
+    author = {{Contribution of Working Groups I, II and III to the Sixth
+              Assessment Report of the Intergovernmental Panel on Climate
+              Change [Core Writing Team, H. Lee and J. Romero (eds.)]}},
+    editor = {{IPCC, Geneva, Switzerland}},
+    year = 2023,
+    note = {doi: 10.59327/IPCC/AR6-9789291691647},
+}
+
+@book{jevons_coal_question,
+    title={The coal question; an inquiry concerning the progress of the
+           nation and the probable exhaustion of our coal-mines},
+    author={Jevons, William Stanley},
+    year={1866},
+    publisher={Macmillan}
+}
+
+@article{understanding_jevons_paradox,
+author = {Richard York and Julius Alexander McGee},
+title = {Understanding the Jevons paradox},
+journal = {Environmental Sociology},
+volume = {2},
+number = {1},
+pages = {77-87},
+year = {2016},
+publisher = {Routledge},
+doi = {10.1080/23251042.2015.1106060},
+}
--- a/manuscrit/include/packages.tex
+++ b/manuscrit/include/packages.tex
@ -90,3 +90,5 @@
 \newfloat{algorithm}{htbp}{lop}
 \floatname{algorithm}{Algorithm}
 \def\algorithmautorefname{Algorithm}
+
+\def\chapterautorefname{Chapter}