phd-thesis/manuscrit/50_CesASMe/05_related_works.tex

\section{Related works}

\paragraph{Another comparative study: \anica{}.} The \anica{}
framework~\cite{anica} also attempts to comparatively evaluate various throughput predictors by
finding examples on which they are inaccurate. \anica{} starts with randomly
generated assembly snippets fed to various code analyzers. Once it finds a
snippet on which (some) code analyzers yield unsatisfying results, it refines
it through a process derived from abstract interpretation to reach a
more general category of input, \eg{} ``a load to a SIMD register followed by a
SIMD arithmetic operation''.

\paragraph{A dynamic code analyzer: \gus{}.}
So far, this manuscript was mostly concerned with static code analyzers.
Throughput prediction tools, however, are not all static.
\gus is a dynamic tool first introduced in \fgruber{}'s PhD
thesis~\cite{phd:gruber}. It leverages \qemu{}'s instrumentation capabilities to
dynamically predict the throughput of user-defined regions of interest in whole
program.
In these regions, it instruments every instruction, memory access, \ldots{} in
order to retrieve the exact events occurring through the program's
execution. \gus{} then leverages throughput, latency and microarchitectural
models to analyze resource usage and produce an accurate theoretical elapsed
cycles prediction.

Its main strength, however, resides in its \emph{sensitivity analysis}
capabilities: by applying an arbitrary factor to some parts of the model (\eg{}
latencies, arithmetics port, \ldots{}), it is possible to investigate the
impact of a specific resource on the final execution time of a region of
interest. It can also accurately determine if a resource is actually a
bottleneck for a region, \ie{} if increasing this resource's capabilities would
reduce the execution time. The output of \gus{} on a region of interest
provides a very detailed insight on each instruction's resource consumption and
its contribution to the final execution time. As a dynamic analysis tool, it
is also able to extract the dependencies an instruction exhibits on a real run.

The main downside of \gus{}, however, is its slowness. As most dynamic tools,
it suffers from a heavy slowdown compared to a native execution of the binary,
oftentimes about $100\times$ slower. While it remains a precious tool to the
user willing to deeply optimize an execution kernel, it makes \gus{} highly
impractical to run on a large collection of execution kernels.

\paragraph{An isolated basic-block profiler: \bhive{}.} In
\autoref{sec:redefine_exec_time} above, we advocated for measuring a basic
block's execution time \emph{in-context}. The \bhive{} profiler~\cite{bhive},
initially written by \ithemal{}'s authors~\cite{ithemal} to provide their model
with sufficient ---~and sufficiently accurate~--- training data, takes an
orthogonal approach to basic block throughput measurement. By mapping memory at
any address accessed by a basic block, it can effectively run and measure
arbitrary code without context, often ---~but not always, as we discuss
later~--- yielding good results.
CesASMe: brutal paper import. Not compiling yet. 2023-09-25 17:00:07 +02:00			`\section{Related works}`

CesASMe: another integration pass 2023-09-26 11:39:26 +02:00			`\paragraph{Another comparative study: \anica{}.} The \anica{}`
			`framework~\cite{anica} also attempts to comparatively evaluate various throughput predictors by`
			`finding examples on which they are inaccurate. \anica{} starts with randomly`
			`generated assembly snippets fed to various code analyzers. Once it finds a`
			`snippet on which (some) code analyzers yield unsatisfying results, it refines`
			`it through a process derived from abstract interpretation to reach a`
			more general category of input, \eg{} ``a load to a SIMD register followed by a
			`SIMD arithmetic operation''.`

			`\paragraph{A dynamic code analyzer: \gus{}.}`
			`So far, this manuscript was mostly concerned with static code analyzers.`
			`Throughput prediction tools, however, are not all static.`
			`\gus is a dynamic tool first introduced in \fgruber{}'s PhD`
			`thesis~\cite{phd:gruber}. It leverages \qemu{}'s instrumentation capabilities to`
			`dynamically predict the throughput of user-defined regions of interest in whole`
			`program.`
			`In these regions, it instruments every instruction, memory access, \ldots{} in`
			`order to retrieve the exact events occurring through the program's`
			`execution. \gus{} then leverages throughput, latency and microarchitectural`
			`models to analyze resource usage and produce an accurate theoretical elapsed`
			`cycles prediction.`
CesASMe: brutal paper import. Not compiling yet. 2023-09-25 17:00:07 +02:00
CesASMe: another integration pass 2023-09-26 11:39:26 +02:00			`Its main strength, however, resides in its \emph{sensitivity analysis}`
			`capabilities: by applying an arbitrary factor to some parts of the model (\eg{}`
			`latencies, arithmetics port, \ldots{}), it is possible to investigate the`
			`impact of a specific resource on the final execution time of a region of`
			`interest. It can also accurately determine if a resource is actually a`
			`bottleneck for a region, \ie{} if increasing this resource's capabilities would`
			`reduce the execution time. The output of \gus{} on a region of interest`
			`provides a very detailed insight on each instruction's resource consumption and`
			`its contribution to the final execution time. As a dynamic analysis tool, it`
			`is also able to extract the dependencies an instruction exhibits on a real run.`
CesASMe: brutal paper import. Not compiling yet. 2023-09-25 17:00:07 +02:00
CesASMe: another integration pass 2023-09-26 11:39:26 +02:00			`The main downside of \gus{}, however, is its slowness. As most dynamic tools,`
			`it suffers from a heavy slowdown compared to a native execution of the binary,`
			`oftentimes about $100\times$ slower. While it remains a precious tool to the`
			`user willing to deeply optimize an execution kernel, it makes \gus{} highly`
			`impractical to run on a large collection of execution kernels.`
CesASMe: brutal paper import. Not compiling yet. 2023-09-25 17:00:07 +02:00
CesASMe: another integration pass 2023-09-26 11:39:26 +02:00			`\paragraph{An isolated basic-block profiler: \bhive{}.} In`
			`\autoref{sec:redefine_exec_time} above, we advocated for measuring a basic`
			`block's execution time \emph{in-context}. The \bhive{} profiler~\cite{bhive},`
			`initially written by \ithemal{}'s authors~\cite{ithemal} to provide their model`
			`with sufficient ---~and sufficiently accurate~--- training data, takes an`
			`orthogonal approach to basic block throughput measurement. By mapping memory at`
			`any address accessed by a basic block, it can effectively run and measure`
			`arbitrary code without context, often ---~but not always, as we discuss`
			`later~--- yielding good results.`