Small backports from paper

This commit is contained in:
Théophile Bastian 2024-06-20 17:56:52 +02:00
parent 59de13b1d1
commit ef62c3b7e5
2 changed files with 17 additions and 21 deletions

View file

@ -59,7 +59,7 @@ $(i_q)_{q\in Q_p}$ are the \uops{} obtained from the decoding of $I_p$.
\begin{lemma}[Distance of in-flight \uops{}] \begin{lemma}[Distance of in-flight \uops{}]
For any pair of instructions $(I_p,I_{p'})$, and two corresponding \uops{}, For any pair of instructions $(I_p,I_{p'})$, and two corresponding \uops{},
$(i_q,i_{q'})$ such that q \in Q_p, q' \in Q_{p'}$, $(i_q,i_{q'})$ such that $q \in Q_p, q' \in Q_{p'}$,
\[ \[
\operatorname{inflight}(i_q) \wedge \operatorname{inflight}(i_{q'}) \Rightarrow \distance{I_p}{I_{p'}}<R \operatorname{inflight}(i_q) \wedge \operatorname{inflight}(i_{q'}) \Rightarrow \distance{I_p}{I_{p'}}<R
\] \]

View file

@ -28,7 +28,6 @@ benchmarks, making the analysis more convenient.
In practice, benchmarks from \cesasme{} are roughly of the following form: In practice, benchmarks from \cesasme{} are roughly of the following form:
\begin{lstlisting}[language=C] \begin{lstlisting}[language=C]
/* Initialize A, B, C here */
for(int measure=0; measure < NUM_MEASURES; ++measure) { for(int measure=0; measure < NUM_MEASURES; ++measure) {
measure_start(); measure_start();
for(int repeat=0; repeat < NUM_REPEATS; ++repeat) { for(int repeat=0; repeat < NUM_REPEATS; ++repeat) {
@ -41,7 +40,7 @@ for(int measure=0; measure < NUM_MEASURES; ++measure) {
\end{lstlisting} \end{lstlisting}
While this is sensible for conducting throughput measures, it also introduces While this is sensible for conducting throughput measures, it also introduces
unwanted dependencies \todo{explain why}. If, for instance, the kernel consists in unwanted dependencies. If, for instance, the kernel consists in
$A[i] = C\times{}A[i] + B[i]$, implemented by\\ $A[i] = C\times{}A[i] + B[i]$, implemented by\\
\begin{minipage}{0.95\linewidth} \begin{minipage}{0.95\linewidth}
\begin{lstlisting}[language={[x86masm]Assembler}] \begin{lstlisting}[language={[x86masm]Assembler}]
@ -100,8 +99,9 @@ source and destination program counters are not in the same basic block are
discarded, as \staticdeps{} cannot detect them by construction. discarded, as \staticdeps{} cannot detect them by construction.
For each of the considered basic blocks, we run our static analysis, For each of the considered basic blocks, we run our static analysis,
\staticdeps{}. We discard the $\Delta{}k$ parameter, as our dynamic analysis does \staticdeps{}. We discard the $\Delta{}k$ parameter --~how many loop iterations
not report an equivalent parameter, but only a pair of program counters. the dependency spans~--, as our dynamic analysis does not report an equivalent
parameter, but only a pair of program counters.
Dynamic dependencies from \depsim{} are converted to Dynamic dependencies from \depsim{} are converted to
\emph{periodic dependencies} in the sense of \staticdeps{} as described in \emph{periodic dependencies} in the sense of \staticdeps{} as described in
@ -281,25 +281,21 @@ the corresponding box-plots in \autoref{fig:staticdeps_uica_cesasme_boxplot}.
\medskip{} \medskip{}
The full dataset \uicadeps{} row is extremely close, on every metric, to the We deduce two things from this experiment.
pruned, \uica{}-only row. On this basis, we argue that \staticdeps{}' addition
to \uica{} is very conclusive: the hints provided by \staticdeps{} are First, the full dataset \uicadeps{} row is extremely close, on every metric, to
the pruned, \uica{}-only row. On this basis, we argue that \staticdeps{}'
addition to \uica{} is very conclusive: the hints provided by \staticdeps{} are
sufficient to make \uica{}'s results as good on the full dataset as they were sufficient to make \uica{}'s results as good on the full dataset as they were
before on a dataset pruned of precisely the kind of dependencies we aim to before on a dataset pruned of precisely the kind of dependencies we aim to
detect. Furthermore, \uica{} and \uicadeps{}' results on the pruned dataset are detect. Thus, at least on workloads similar to Polybench, \staticdeps{} is able
extremely close: this further supports the accuracy of \staticdeps{}. to resolve the issue of memory-carried dependencies for \uica{}'s throughput
analysis.
\medskip{} Furthermore, \uica{} and \uicadeps{}' results on the pruned dataset are
extremely close. From this, we argue that \staticdeps{} does not introduce
While the results obtained against \depsim{} in false positives when no dependency should be found; its addition to \uica{}
\autoref{ssec:staticdeps_eval_depsim} above were reasonable, they were not does not negatively impact its accuracy whenever it is not relevant.
excellent either, and showed that many kind of dependencies were still missed
by \staticdeps{}. However, our evaluation on \cesasme{} by enriching \uica{}
shows that, at least on the workload considered, the dependencies that actually
matter from a performance debugging point of view are properly found.
This, however, might not be true for other kinds of applications that would
require a dependencies analysis.
\subsection{Analysis speed} \subsection{Analysis speed}