Small backports from paper
This commit is contained in:
parent
59de13b1d1
commit
ef62c3b7e5
2 changed files with 17 additions and 21 deletions
|
@ -59,7 +59,7 @@ $(i_q)_{q\in Q_p}$ are the \uops{} obtained from the decoding of $I_p$.
|
||||||
|
|
||||||
\begin{lemma}[Distance of in-flight \uops{}]
|
\begin{lemma}[Distance of in-flight \uops{}]
|
||||||
For any pair of instructions $(I_p,I_{p'})$, and two corresponding \uops{},
|
For any pair of instructions $(I_p,I_{p'})$, and two corresponding \uops{},
|
||||||
$(i_q,i_{q'})$ such that q \in Q_p, q' \in Q_{p'}$,
|
$(i_q,i_{q'})$ such that $q \in Q_p, q' \in Q_{p'}$,
|
||||||
\[
|
\[
|
||||||
\operatorname{inflight}(i_q) \wedge \operatorname{inflight}(i_{q'}) \Rightarrow \distance{I_p}{I_{p'}}<R
|
\operatorname{inflight}(i_q) \wedge \operatorname{inflight}(i_{q'}) \Rightarrow \distance{I_p}{I_{p'}}<R
|
||||||
\]
|
\]
|
||||||
|
|
|
@ -28,7 +28,6 @@ benchmarks, making the analysis more convenient.
|
||||||
In practice, benchmarks from \cesasme{} are roughly of the following form:
|
In practice, benchmarks from \cesasme{} are roughly of the following form:
|
||||||
|
|
||||||
\begin{lstlisting}[language=C]
|
\begin{lstlisting}[language=C]
|
||||||
/* Initialize A, B, C here */
|
|
||||||
for(int measure=0; measure < NUM_MEASURES; ++measure) {
|
for(int measure=0; measure < NUM_MEASURES; ++measure) {
|
||||||
measure_start();
|
measure_start();
|
||||||
for(int repeat=0; repeat < NUM_REPEATS; ++repeat) {
|
for(int repeat=0; repeat < NUM_REPEATS; ++repeat) {
|
||||||
|
@ -41,7 +40,7 @@ for(int measure=0; measure < NUM_MEASURES; ++measure) {
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
|
|
||||||
While this is sensible for conducting throughput measures, it also introduces
|
While this is sensible for conducting throughput measures, it also introduces
|
||||||
unwanted dependencies \todo{explain why}. If, for instance, the kernel consists in
|
unwanted dependencies. If, for instance, the kernel consists in
|
||||||
$A[i] = C\times{}A[i] + B[i]$, implemented by\\
|
$A[i] = C\times{}A[i] + B[i]$, implemented by\\
|
||||||
\begin{minipage}{0.95\linewidth}
|
\begin{minipage}{0.95\linewidth}
|
||||||
\begin{lstlisting}[language={[x86masm]Assembler}]
|
\begin{lstlisting}[language={[x86masm]Assembler}]
|
||||||
|
@ -100,8 +99,9 @@ source and destination program counters are not in the same basic block are
|
||||||
discarded, as \staticdeps{} cannot detect them by construction.
|
discarded, as \staticdeps{} cannot detect them by construction.
|
||||||
|
|
||||||
For each of the considered basic blocks, we run our static analysis,
|
For each of the considered basic blocks, we run our static analysis,
|
||||||
\staticdeps{}. We discard the $\Delta{}k$ parameter, as our dynamic analysis does
|
\staticdeps{}. We discard the $\Delta{}k$ parameter --~how many loop iterations
|
||||||
not report an equivalent parameter, but only a pair of program counters.
|
the dependency spans~--, as our dynamic analysis does not report an equivalent
|
||||||
|
parameter, but only a pair of program counters.
|
||||||
|
|
||||||
Dynamic dependencies from \depsim{} are converted to
|
Dynamic dependencies from \depsim{} are converted to
|
||||||
\emph{periodic dependencies} in the sense of \staticdeps{} as described in
|
\emph{periodic dependencies} in the sense of \staticdeps{} as described in
|
||||||
|
@ -281,25 +281,21 @@ the corresponding box-plots in \autoref{fig:staticdeps_uica_cesasme_boxplot}.
|
||||||
|
|
||||||
\medskip{}
|
\medskip{}
|
||||||
|
|
||||||
The full dataset \uicadeps{} row is extremely close, on every metric, to the
|
We deduce two things from this experiment.
|
||||||
pruned, \uica{}-only row. On this basis, we argue that \staticdeps{}' addition
|
|
||||||
to \uica{} is very conclusive: the hints provided by \staticdeps{} are
|
First, the full dataset \uicadeps{} row is extremely close, on every metric, to
|
||||||
|
the pruned, \uica{}-only row. On this basis, we argue that \staticdeps{}'
|
||||||
|
addition to \uica{} is very conclusive: the hints provided by \staticdeps{} are
|
||||||
sufficient to make \uica{}'s results as good on the full dataset as they were
|
sufficient to make \uica{}'s results as good on the full dataset as they were
|
||||||
before on a dataset pruned of precisely the kind of dependencies we aim to
|
before on a dataset pruned of precisely the kind of dependencies we aim to
|
||||||
detect. Furthermore, \uica{} and \uicadeps{}' results on the pruned dataset are
|
detect. Thus, at least on workloads similar to Polybench, \staticdeps{} is able
|
||||||
extremely close: this further supports the accuracy of \staticdeps{}.
|
to resolve the issue of memory-carried dependencies for \uica{}'s throughput
|
||||||
|
analysis.
|
||||||
|
|
||||||
\medskip{}
|
Furthermore, \uica{} and \uicadeps{}' results on the pruned dataset are
|
||||||
|
extremely close. From this, we argue that \staticdeps{} does not introduce
|
||||||
While the results obtained against \depsim{} in
|
false positives when no dependency should be found; its addition to \uica{}
|
||||||
\autoref{ssec:staticdeps_eval_depsim} above were reasonable, they were not
|
does not negatively impact its accuracy whenever it is not relevant.
|
||||||
excellent either, and showed that many kind of dependencies were still missed
|
|
||||||
by \staticdeps{}. However, our evaluation on \cesasme{} by enriching \uica{}
|
|
||||||
shows that, at least on the workload considered, the dependencies that actually
|
|
||||||
matter from a performance debugging point of view are properly found.
|
|
||||||
|
|
||||||
This, however, might not be true for other kinds of applications that would
|
|
||||||
require a dependencies analysis.
|
|
||||||
|
|
||||||
\subsection{Analysis speed}
|
\subsection{Analysis speed}
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue