Introduction: first full version
This commit is contained in:
parent
9dd4432342
commit
b42e2c3d80
5 changed files with 137 additions and 4 deletions
manuscrit
10_introduction
60_staticdeps
99_conclusion
biblio
include
|
@ -1,4 +1,5 @@
|
|||
\chapter{Introduction}\label{chap:intro}
|
||||
\chapter*{Introduction}\label{chap:intro}
|
||||
\addcontentsline{toc}{chapter}{Introduction}
|
||||
|
||||
Developing new features and fixing problems are often regarded as the major
|
||||
parts of the development cycle of a program. However, performance optimization
|
||||
|
@ -59,7 +60,7 @@ CPUs. In many cases, transformations targeting a specific microarchitecture can
|
|||
be very beneficial.
|
||||
For instance, Uday Bondhugula found out that manual tuning, through many
|
||||
techniques and tools, of a general matrix multiplication could multiply its
|
||||
throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even 130 times
|
||||
throughput by roughly 13.5 compared to \texttt{gcc~-O3}, or even be 130 times
|
||||
faster than \texttt{clang -O3}~\cite{dgemm_finetune}.
|
||||
This kind of optimizations, however, requires manual effort, and a
|
||||
deep expert knowledge both in optimization techniques and on the specific
|
||||
|
@ -95,3 +96,101 @@ counters sampling, may not always be precise and faithful. They can, however,
|
|||
inspect at will their inner model state, and derive more advanced metrics or
|
||||
hypotheses, for instance by predicting which resource might be overloaded and
|
||||
slow the whole computation.
|
||||
|
||||
\vspace{2em}
|
||||
|
||||
In this thesis, we explore the three major aspects that work towards a code
|
||||
analyzers' accuracy: a \emph{backend model}, a \emph{frontend model} and a
|
||||
\emph{dependencies model}. We propose contributions to strengthen them, as well
|
||||
as to automate the underlying models' synthesis. We focus on \emph{static}
|
||||
code analyzers, that derive metrics, including runtime predictions, from an
|
||||
assembly code or assembled binary.
|
||||
|
||||
The \hyperref[chap:foundations]{first chapter} introduces the foundations
|
||||
of this manuscript, describing the microarchitectural notions on which our
|
||||
analyses will be based, and exploring the current state of the art.
|
||||
|
||||
The \autoref{chap:palmed} introduces \palmed{}, a benchmarks-based tool
|
||||
automatically synthesizing a model of a CPU's backend. Although the
|
||||
theoretical core of \palmed{} is not my own work, I made major contributions to
|
||||
other aspects of the tool. The chapter also presents the foundations and
|
||||
methodologies \palmed{} shares with the following parts.
|
||||
|
||||
In \autoref{chap:frontend}, we explore the frontend aspects of static code
|
||||
analyzers. This chapter focuses on the manual study of the Cortex A72
|
||||
processor, and proposes a static model of its frontend. We finally reflect on
|
||||
the generalization of our manual approach into an automated frontend modelling
|
||||
tool, akin to \palmed.
|
||||
|
||||
Chapter~\ref{chap:CesASMe} makes an extensive study of the state-of-the-art
|
||||
code analyzers' strengths and shortcomings. To this end, we introduce a
|
||||
fully-tooled approach in two parts: first, a benchmarks-generation procedure,
|
||||
yielding thousands of benchmarks relevant in the context of our approach; then,
|
||||
a benchmarking harness evaluating code analyzers on these benchmarks. We find
|
||||
that most state-of-the-art code analyzers struggle to correctly account for
|
||||
some types of data dependencies.
|
||||
|
||||
Further building on our findings, \autoref{chap:staticdeps} introduces
|
||||
\staticdeps{}, an accurate heuristic-based tool to statically extract data
|
||||
dependencies from an assembly computation kernel. We extend \uica{}, a
|
||||
state-of-the-art code analyzer, with \staticdeps{} predictions, and evaluate
|
||||
the enhancement of its accuracy.
|
||||
|
||||
\bigskip{}
|
||||
|
||||
Throughout this manuscript, we explore notions that are transversal to the
|
||||
hardware blocks the chapters lay out.
|
||||
|
||||
\medskip{}
|
||||
|
||||
Most of our approaches work towards an \emph{automated,
|
||||
microarchitecture-independent} tooling. While fine-grained, accurate code
|
||||
analysis is directly concerned with the underlying hardware and its specific
|
||||
implementation, we strive to write tooling that has the least dependency
|
||||
towards vendor-specific interfaces. In practice, this rules out most uses of
|
||||
hardware counters, which depend greatly on the manufacturer, or even the
|
||||
specific chip considered. As some CPUs expose only very bare hardware counters,
|
||||
we see this commitment as an opportunity to develop methodologies able to model
|
||||
these processors.
|
||||
|
||||
This is particularly true of \palmed, in \autoref{chap:palmed}, whose goal is
|
||||
to model a processor's backend resources without resorting to its hardware
|
||||
counters. Our frontend study, in \autoref{chap:frontend}, also follows this
|
||||
strategy by focusing on a processor whose hardware counters give little to no
|
||||
insight on its frontend. While this goal is less relevant to \staticdeps{}, we
|
||||
rely on external libraries to abstract the underlying architecture.
|
||||
|
||||
\medskip{}
|
||||
|
||||
Our methodologies are, whenever relevant, \emph{benchmarks- and
|
||||
experiments-driven}, in a bottom-up style, placing real hardware at the center.
|
||||
In this spirit, \palmed{} is based solely on benchmarks, discarding entirely
|
||||
the manufacturer's documentation. Our model of the Cortex A72 frontend is based
|
||||
both on measures and documentation, yet it strives to be a case study from
|
||||
which future works can generalize, to automatically synthesize frontend models
|
||||
in a benchmarks-based fashion. One of the goals of our survey of the state of
|
||||
the art, in \autoref{chap:CesASMe}, is to identify through experiments the
|
||||
shortcomings that are most crucial to address in order to strengthen static
|
||||
code analyzers.
|
||||
|
||||
\medskip{}
|
||||
|
||||
Finally, against the extent of the ecological and climatic crises we are
|
||||
facing, as assessed among others by the IPCC~\cite{ipcc_ar6_syr}, we believe
|
||||
that every field and discipline should strive for a positive impact or, at
|
||||
the very least, to reduce as much as possible its negative impact. Our very
|
||||
modest contribution to this end, throughout this thesis, is to commit
|
||||
ourselves to computations as \emph{frugal} as possible: run computation-heavy
|
||||
experiments as least as possible; avoid running multiple times the same
|
||||
experiment, but cache results instead when this is feasible; etc. This
|
||||
commitment partly motivated us to implement a results database in \palmed{}, to
|
||||
compute only once each benchmark. As our experiments in
|
||||
\autoref{chap:CesASMe} take many hours to yield a result, we at least evaluate
|
||||
their carbon impact.
|
||||
|
||||
We believe it noteworthy, however, to point out that although this thesis is
|
||||
concerned with tools that help optimize large computation workloads,
|
||||
\emph{optimization does not lead to frugality}. In most cases, Jevons paradox
|
||||
---~also called rebound effect~--- makes it instead
|
||||
more likely to lead to an increased absolute usage of computational
|
||||
resources~\cite{jevons_coal_question,understanding_jevons_paradox}.
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
\chapter{Static extraction of memory-carried dependencies}
|
||||
\chapter{Static extraction of memory-carried
|
||||
dependencies}\label{chap:staticdeps}
|
||||
|
||||
\input{00_intro.tex}
|
||||
\input{10_types_of_deps.tex}
|
||||
|
|
|
@ -1 +1,2 @@
|
|||
\chapter{Conclusion}
|
||||
\chapter*{Conclusion}
|
||||
\addcontentsline{toc}{chapter}{Conclusion}
|
||||
|
|
|
@ -34,3 +34,33 @@
|
|||
series = {ISCA '22}
|
||||
}
|
||||
|
||||
|
||||
@book{ipcc_ar6_syr,
|
||||
title = {IPCC, 2023: Climate Change 2023: Synthesis Report},
|
||||
author = {{Contribution of Working Groups I, II and III to the Sixth
|
||||
Assessment Report of the Intergovernmental Panel on Climate
|
||||
Change [Core Writing Team, H. Lee and J. Romero (eds.)]}},
|
||||
editor = {{IPCC, Geneva, Switzerland}},
|
||||
year = 2023,
|
||||
note = {doi: 10.59327/IPCC/AR6-9789291691647},
|
||||
}
|
||||
|
||||
@book{jevons_coal_question,
|
||||
title={The coal question; an inquiry concerning the progress of the
|
||||
nation and the probable exhaustion of our coal-mines},
|
||||
author={Jevons, William Stanley},
|
||||
year={1866},
|
||||
publisher={Macmillan}
|
||||
}
|
||||
|
||||
@article{understanding_jevons_paradox,
|
||||
author = {Richard York and Julius Alexander McGee},
|
||||
title = {Understanding the Jevons paradox},
|
||||
journal = {Environmental Sociology},
|
||||
volume = {2},
|
||||
number = {1},
|
||||
pages = {77-87},
|
||||
year = {2016},
|
||||
publisher = {Routledge},
|
||||
doi = {10.1080/23251042.2015.1106060},
|
||||
}
|
||||
|
|
|
@ -90,3 +90,5 @@
|
|||
\newfloat{algorithm}{htbp}{lop}
|
||||
\floatname{algorithm}{Algorithm}
|
||||
\def\algorithmautorefname{Algorithm}
|
||||
|
||||
\def\chapterautorefname{Chapter}
|
||||
|
|
Loading…
Add table
Reference in a new issue