38 lines
2 KiB
TeX
38 lines
2 KiB
TeX
In the previous chapter, our major finding was that, in the current state of
|
|
the art, code analyzers deal poorly with memory-carried dependencies. We found
|
|
this flaw to be responsible, in our dataset, for a roughly $1.5\times$ increase
|
|
in MAPE, and up to $2.6\times$ on the third quartile of error.
|
|
|
|
The large impact of dependencies on the final runtime of a kernel is, in
|
|
reality, not very surprising. In chapters~\ref{chap:palmed}
|
|
and~\ref{chap:frontend}, we did not consider latency; hence, the only impact of
|
|
an instruction was its throughput, each instruction being issued as soon as
|
|
possible. Dependencies, however, force the processor to wait for some
|
|
instructions' results before issuing some others; the \emph{latency} of an
|
|
instruction becomes a critical factor.
|
|
|
|
On Skylake, for instance, the instruction \lstxasm{add \%rax, \%rbx} has a
|
|
latency of one full cycle. Thus, the kernel
|
|
\begin{lstlisting}[language={[x86masm]Assembler}]
|
|
add %rax, %rbx
|
|
add %rbx, %rcx
|
|
\end{lstlisting}
|
|
executes, in steady state, in half a cycle without accounting for the
|
|
dependency; yet these two instructions in isolation would take
|
|
$1\,\sfrac{1}{4}$ cycles when accounting for the dependency. Some instructions
|
|
still are more extreme; for instance, the \lstxasm{vfmadd*pd \%ymm0, \%ymm1,
|
|
\%ymm2} family of instructions have a latency of four full cycles, while
|
|
without dependencies, two can be issued every cycle.
|
|
|
|
\medskip{}
|
|
|
|
In the previous chapter, we also presented \gus{}, a dynamic code analyzer
|
|
based on \qemu{}, which we found to be very effective to detect memory-carried
|
|
dependencies and the slowdown they incur on the whole program. However, this
|
|
solution results in a runtime increase of about two orders of magnitude, which
|
|
may not be acceptable in many use cases.
|
|
|
|
In this chapter, we instead present \staticdeps{}, a fully static analyzer able
|
|
to detect memory-carried dependencies in many cases. We evaluate it by
|
|
providing \uica{} with its analysis of dependencies, bringing it on-par with
|
|
\gus{} on the full, non-pruned dataset of the previous chapter.
|