CesASMe: first adaptations
This commit is contained in:
parent
fc9182428d
commit
f6f0336b34
9 changed files with 79 additions and 70 deletions
|
@ -1,20 +1,3 @@
|
||||||
\begin{abstract}
|
|
||||||
A variety of code analyzers, such as \iaca, \uica, \llvmmca{} or
|
|
||||||
\ithemal{}, strive to statically predict the throughput of a computation
|
|
||||||
kernel. Each analyzer is based on its own simplified CPU model
|
|
||||||
reasoning at the scale of an isolated basic block.
|
|
||||||
Facing this diversity, evaluating their strengths and
|
|
||||||
weaknesses is important to guide both their usage and their enhancement.
|
|
||||||
|
|
||||||
We argue that reasoning at the scale of a single basic block is not
|
|
||||||
always sufficient and that a lack of context can mislead analyses. We present \tool, a fully-tooled
|
|
||||||
solution to evaluate code analyzers on C-level benchmarks. It is composed of a
|
|
||||||
benchmark derivation procedure that feeds an evaluation harness. We use it to
|
|
||||||
evaluate state-of-the-art code analyzers and to provide insights on their
|
|
||||||
precision. We use \tool's results to show that memory-carried data
|
|
||||||
dependencies are a major source of imprecision for these tools.
|
|
||||||
\end{abstract}
|
|
||||||
|
|
||||||
\section{Introduction}\label{sec:intro}
|
\section{Introduction}\label{sec:intro}
|
||||||
|
|
||||||
At a time when software is expected to perform more computations, faster and in
|
At a time when software is expected to perform more computations, faster and in
|
||||||
|
@ -23,14 +6,14 @@ in particular the CPU resources) they consume are very useful to guide their
|
||||||
optimization. This need is reflected in the diversity of binary or assembly
|
optimization. This need is reflected in the diversity of binary or assembly
|
||||||
code analyzers following the deprecation of \iaca~\cite{iaca}, which Intel has
|
code analyzers following the deprecation of \iaca~\cite{iaca}, which Intel has
|
||||||
maintained through 2019. Whether it is \llvmmca{}~\cite{llvm-mca},
|
maintained through 2019. Whether it is \llvmmca{}~\cite{llvm-mca},
|
||||||
\uica{}~\cite{uica}, \ithemal~\cite{ithemal} or \gus~\cite{phd:gruber}, all these tools strive to extract various
|
\uica{}~\cite{uica}, \ithemal~\cite{ithemal} or \gus~\cite{phd:gruber}, all
|
||||||
performance metrics, including the number of CPU cycles a computation kernel will take
|
these tools strive to extract various performance metrics, including the number
|
||||||
---~which roughly translates to execution time.
|
of CPU cycles a computation kernel will take ---~which roughly translates to
|
||||||
In addition to raw measurements (relying on hardware counters), these model-based analyses provide
|
execution time. In addition to raw measurements (relying on hardware
|
||||||
higher-level and refined data, to expose the bottlenecks and guide the
|
counters), these model-based analyses provide higher-level and refined data, to
|
||||||
optimization of a given code. This feedback is useful to experts optimizing
|
expose the bottlenecks and guide the optimization of a given code. This
|
||||||
computation kernels, including scientific simulations and deep-learning
|
feedback is useful to experts optimizing computation kernels, including
|
||||||
kernels.
|
scientific simulations and deep-learning kernels.
|
||||||
|
|
||||||
An exact throughput prediction would require a cycle-accurate simulator of the
|
An exact throughput prediction would require a cycle-accurate simulator of the
|
||||||
processor, based on microarchitectural data that is most often not publicly
|
processor, based on microarchitectural data that is most often not publicly
|
||||||
|
@ -39,6 +22,7 @@ solve in their own way the challenge of modeling complex CPUs while remaining
|
||||||
simple enough to yield a prediction in a reasonable time, ending up with
|
simple enough to yield a prediction in a reasonable time, ending up with
|
||||||
different models. For instance, on the following x86-64 basic block computing a
|
different models. For instance, on the following x86-64 basic block computing a
|
||||||
general matrix multiplication,
|
general matrix multiplication,
|
||||||
|
|
||||||
\begin{minipage}{0.95\linewidth}
|
\begin{minipage}{0.95\linewidth}
|
||||||
\begin{lstlisting}[language={[x86masm]Assembler}]
|
\begin{lstlisting}[language={[x86masm]Assembler}]
|
||||||
movsd (%rcx, %rax), %xmm0
|
movsd (%rcx, %rax), %xmm0
|
||||||
|
@ -112,30 +96,27 @@ generally be known. More importantly, the compiler may apply any number of
|
||||||
transformations: unrolling, for instance, changes this number. Control flow may
|
transformations: unrolling, for instance, changes this number. Control flow may
|
||||||
also be complicated by code versioning.
|
also be complicated by code versioning.
|
||||||
|
|
||||||
%In the general case, instrumenting the generated code to obtain the number of
|
|
||||||
%occurrences of the basic block yields accurate results.
|
|
||||||
|
|
||||||
\bigskip
|
\bigskip
|
||||||
|
|
||||||
In this article, we present a fully-tooled solution to evaluate and compare the
|
In this article, we present a fully-tooled solution to evaluate and compare the
|
||||||
diversity of static throughput predictors. Our tool, \tool, solves two main
|
diversity of static throughput predictors. Our tool, \cesasme, solves two main
|
||||||
issues in this direction. In Section~\ref{sec:bench_gen}, we describe how
|
issues in this direction. In Section~\ref{sec:bench_gen}, we describe how
|
||||||
\tool{} generates a wide variety of computation kernels stressing different
|
\cesasme{} generates a wide variety of computation kernels stressing different
|
||||||
parameters of the architecture, and thus of the predictors' models, while
|
parameters of the architecture, and thus of the predictors' models, while
|
||||||
staying close to representative workloads. To achieve this, we use
|
staying close to representative workloads. To achieve this, we use
|
||||||
Polybench~\cite{polybench}, a C-level benchmark suite representative of
|
Polybench~\cite{bench:polybench}, a C-level benchmark suite representative of
|
||||||
scientific computation workloads, that we combine with a variety of
|
scientific computation workloads, that we combine with a variety of
|
||||||
optimisations, including polyhedral loop transformations.
|
optimisations, including polyhedral loop transformations.
|
||||||
In Section~\ref{sec:bench_harness}, we describe how \tool{} is able to
|
In Section~\ref{sec:bench_harness}, we describe how \cesasme{} is able to
|
||||||
evaluate throughput predictors on this set of benchmarks by lifting their
|
evaluate throughput predictors on this set of benchmarks by lifting their
|
||||||
predictions to a total number of cycles that can be compared to a hardware
|
predictions to a total number of cycles that can be compared to a hardware
|
||||||
counters-based measure. A
|
counters-based measure. A
|
||||||
high-level view of \tool{} is shown in Figure~\ref{fig:contrib}.
|
high-level view of \cesasme{} is shown in Figure~\ref{fig:contrib}.
|
||||||
|
|
||||||
In Section~\ref{sec:exp_setup}, we detail our experimental setup and assess our
|
In Section~\ref{sec:exp_setup}, we detail our experimental setup and assess our
|
||||||
methodology. In Section~\ref{sec:results_analysis}, we compare the predictors' results and
|
methodology. In Section~\ref{sec:results_analysis}, we compare the predictors' results and
|
||||||
analyze the results of \tool{}.
|
analyze the results of \cesasme{}.
|
||||||
In addition to statistical studies, we use \tool's results
|
In addition to statistical studies, we use \cesasme's results
|
||||||
to investigate analyzers' flaws. We show that code
|
to investigate analyzers' flaws. We show that code
|
||||||
analyzers do not always correctly model data dependencies through memory
|
analyzers do not always correctly model data dependencies through memory
|
||||||
accesses, substantially impacting their precision.
|
accesses, substantially impacting their precision.
|
||||||
|
|
|
@ -39,7 +39,7 @@ directly (no indirections) and whose loops are affine.
|
||||||
These constraints are necessary to ensure that the microkernelification phase,
|
These constraints are necessary to ensure that the microkernelification phase,
|
||||||
presented below, generates segfault-free code.
|
presented below, generates segfault-free code.
|
||||||
|
|
||||||
In this case, we use Polybench~\cite{polybench}, a suite of 30
|
In this case, we use Polybench~\cite{bench:polybench}, a suite of 30
|
||||||
benchmarks for polyhedral compilation ---~of which we use only 26. The
|
benchmarks for polyhedral compilation ---~of which we use only 26. The
|
||||||
\texttt{nussinov}, \texttt{ludcmp} and \texttt{deriche} benchmarks are
|
\texttt{nussinov}, \texttt{ludcmp} and \texttt{deriche} benchmarks are
|
||||||
removed because they are incompatible with PoCC (introduced below). The
|
removed because they are incompatible with PoCC (introduced below). The
|
||||||
|
@ -58,7 +58,9 @@ resources of the target architecture, and by extension the models on which the
|
||||||
static analyzers are based.
|
static analyzers are based.
|
||||||
|
|
||||||
In this case, we chose to use the
|
In this case, we chose to use the
|
||||||
\textsc{Pluto}~\cite{pluto} and PoCC~\cite{pocc} polyhedral compilers, to easily access common loop nest optimizations~: register tiling, tiling,
|
\textsc{Pluto}~\cite{tool:pluto} and PoCC~\cite{tool:pocc} polyhedral
|
||||||
|
compilers, to easily access common loop nest optimizations~: register tiling,
|
||||||
|
tiling,
|
||||||
skewing, vectorization/simdization, loop unrolling, loop permutation,
|
skewing, vectorization/simdization, loop unrolling, loop permutation,
|
||||||
loop fusion.
|
loop fusion.
|
||||||
These transformations are meant to maximize variety within the initial
|
These transformations are meant to maximize variety within the initial
|
||||||
|
|
|
@ -82,6 +82,6 @@ approach, as most throughput prediction tools work a basic block-level, and are
|
||||||
thus readily available and can be directly plugged into our harness.
|
thus readily available and can be directly plugged into our harness.
|
||||||
|
|
||||||
Finally, we control the proportion of cache misses in the program's execution
|
Finally, we control the proportion of cache misses in the program's execution
|
||||||
using \texttt{Cachegrind}~\cite{valgrind} and \gus; programs that have more
|
using \texttt{Cachegrind}~\cite{tool:valgrind} and \gus; programs that have more
|
||||||
than 15\,\% of cache misses on a warm cache are not considered L1-resident and
|
than 15\,\% of cache misses on a warm cache are not considered L1-resident and
|
||||||
are discarded.
|
are discarded.
|
||||||
|
|
|
@ -64,13 +64,12 @@ consequently, lifted predictions can reasonably be compared to one another.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{figs/results_comparability_hist.pdf}
|
\includegraphics[width=0.7\linewidth]{results_comparability_hist.pdf}
|
||||||
\caption{Relative error distribution \wrt{} \perf}\label{fig:exp_comparability}
|
\caption{Relative error distribution \wrt{} \perf}\label{fig:exp_comparability}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{table}
|
\begin{table}
|
||||||
\centering
|
\centering
|
||||||
\caption{Relative error statistics \wrt{} \perf}\label{table:exp_comparability}
|
|
||||||
\begin{tabular}{l r r}
|
\begin{tabular}{l r r}
|
||||||
\toprule
|
\toprule
|
||||||
& \textbf{Best block-based} & \textbf{BHive} \\
|
& \textbf{Best block-based} & \textbf{BHive} \\
|
||||||
|
@ -84,13 +83,12 @@ consequently, lifted predictions can reasonably be compared to one another.
|
||||||
Q3 (\%) & 15.41 & 23.01 \\
|
Q3 (\%) & 15.41 & 23.01 \\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
|
\caption{Relative error statistics \wrt{} \perf}\label{table:exp_comparability}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
|
||||||
\begin{table*}[!htbp]
|
\begin{table}[!htbp]
|
||||||
\centering
|
\centering
|
||||||
\caption{Bottleneck reports from the studied tools}\label{table:coverage}
|
|
||||||
|
|
||||||
\begin{tabular}{l | r r r | r r r | r r r}
|
\begin{tabular}{l | r r r | r r r | r r r}
|
||||||
\toprule
|
\toprule
|
||||||
& \multicolumn{3}{c|}{\textbf{Frontend}}
|
& \multicolumn{3}{c|}{\textbf{Frontend}}
|
||||||
|
@ -128,7 +126,8 @@ floyd-warshall & 74 & 16 & 29.7 \% & 16 & 24 & 68.8 \% & 20 &
|
||||||
\textbf{Total} & 907 & 1360 & 35.2 \% & 509 & 687 & 65.8 \% & 310 & 728 & 70.3 \% \\
|
\textbf{Total} & 907 & 1360 & 35.2 \% & 509 & 687 & 65.8 \% & 310 & 728 & 70.3 \% \\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table*}
|
\caption{Bottleneck reports from the studied tools}\label{table:coverage}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
\subsection{Relevance and representativity (bottleneck
|
\subsection{Relevance and representativity (bottleneck
|
||||||
analysis)}\label{ssec:bottleneck_diversity}
|
analysis)}\label{ssec:bottleneck_diversity}
|
||||||
|
|
|
@ -11,28 +11,31 @@ understanding of which tool is more suited for each situation.
|
||||||
|
|
||||||
\subsection{Throughput results}\label{ssec:overall_results}
|
\subsection{Throughput results}\label{ssec:overall_results}
|
||||||
|
|
||||||
\begin{table*}
|
\begin{table}
|
||||||
\centering
|
\centering
|
||||||
\caption{Statistical analysis of overall results}\label{table:overall_analysis_stats}
|
\footnotesize
|
||||||
\begin{tabular}{l r r r r r r r r r}
|
\begin{tabular}{l r r r r r r r r r}
|
||||||
\toprule
|
\toprule
|
||||||
\textbf{Bencher} & \textbf{Datapoints} & \textbf{Failures} & \textbf{(\%)} &
|
\textbf{Bencher} & \textbf{Datapoints} &
|
||||||
\textbf{MAPE} & \textbf{Median} & \textbf{Q1} & \textbf{Q3} & \textbf{K. tau} & \textbf{Time (CPU$\cdot$h)}\\
|
\multicolumn{2}{c}{\textbf{Failures}} &
|
||||||
|
\textbf{MAPE} & \textbf{Median} & \textbf{Q1} & \textbf{Q3} & \textbf{$K_\tau$} & \textbf{Time}\\
|
||||||
|
& & (Count) & (\%) & (\%) & (\%) & (\%) & (\%) & & (CPU$\cdot$h) \\
|
||||||
\midrule
|
\midrule
|
||||||
BHive & 2198 & 1302 & (37.20\,\%) & 27.95\,\% & 7.78\,\% & 3.01\,\% & 23.01\,\% & 0.81 & 1.37\\
|
BHive & 2198 & 1302 & (37.20) & 27.95 & 7.78 & 3.01 & 23.01 & 0.81 & 1.37\\
|
||||||
llvm-mca & 3500 & 0 & (0.00\,\%) & 36.71\,\% & 27.80\,\% & 12.92\,\% & 59.80\,\% & 0.57 & 0.96 \\
|
llvm-mca & 3500 & 0 & (0.00) & 36.71 & 27.80 & 12.92 & 59.80 & 0.57 & 0.96 \\
|
||||||
UiCA & 3500 & 0 & (0.00\,\%) & 29.59\,\% & 18.26\,\% & 7.11\,\% & 52.99\,\% & 0.58 & 2.12 \\
|
UiCA & 3500 & 0 & (0.00) & 29.59 & 18.26 & 7.11 & 52.99 & 0.58 & 2.12 \\
|
||||||
Ithemal & 3500 & 0 & (0.00\,\%) & 57.04\,\% & 48.70\,\% & 22.92\,\% & 75.69\,\% & 0.39 & 0.38 \\
|
Ithemal & 3500 & 0 & (0.00) & 57.04 & 48.70 & 22.92 & 75.69 & 0.39 & 0.38 \\
|
||||||
Iaca & 3500 & 0 & (0.00\,\%) & 30.23\,\% & 18.51\,\% & 7.13\,\% & 57.18\,\% & 0.59 & 1.31 \\
|
Iaca & 3500 & 0 & (0.00) & 30.23 & 18.51 & 7.13 & 57.18 & 0.59 & 1.31 \\
|
||||||
Gus & 3500 & 0 & (0.00\,\%) & 20.37\,\% & 15.01\,\% & 7.82\,\% & 30.59\,\% & 0.82 & 188.04 \\
|
Gus & 3500 & 0 & (0.00) & 20.37 & 15.01 & 7.82 & 30.59 & 0.82 & 188.04 \\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table*}
|
\caption{Statistical analysis of overall results}\label{table:overall_analysis_stats}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
The error distribution of the relative errors, for each tool, is presented as a
|
The error distribution of the relative errors, for each tool, is presented as a
|
||||||
box plot in Figure~\ref{fig:overall_analysis_boxplot}. Statistical indicators
|
box plot in Figure~\ref{fig:overall_analysis_boxplot}. Statistical indicators
|
||||||
are also given in Table~\ref{table:overall_analysis_stats}. We also give, for
|
are also given in Table~\ref{table:overall_analysis_stats}. We also give, for
|
||||||
each tool, its Kendall's tau indicator~\cite{kendall1938tau}: this indicator,
|
each tool, its Kendall's tau indicator~\cite{kendalltau}: this indicator,
|
||||||
used to evaluate \eg{} uiCA~\cite{uica} and Palmed~\cite{palmed}, measures how
|
used to evaluate \eg{} uiCA~\cite{uica} and Palmed~\cite{palmed}, measures how
|
||||||
well the pair-wise ordering of benchmarks is preserved, $-1$ being a full
|
well the pair-wise ordering of benchmarks is preserved, $-1$ being a full
|
||||||
anti-correlation and $1$ a full correlation. This is especially useful when one
|
anti-correlation and $1$ a full correlation. This is especially useful when one
|
||||||
|
@ -40,7 +43,8 @@ is not interested in a program's absolute throughput, but rather in comparing
|
||||||
which program has a better throughput.
|
which program has a better throughput.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\includegraphics[width=\linewidth]{figs/overall_analysis_boxplot.pdf}
|
\centering
|
||||||
|
\includegraphics[width=0.5\linewidth]{overall_analysis_boxplot.pdf}
|
||||||
\caption{Statistical distribution of relative errors}\label{fig:overall_analysis_boxplot}
|
\caption{Statistical distribution of relative errors}\label{fig:overall_analysis_boxplot}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
@ -185,7 +189,6 @@ frontend bottlenecks, thus making it easier for them to agree.
|
||||||
|
|
||||||
\begin{table}
|
\begin{table}
|
||||||
\centering
|
\centering
|
||||||
\caption{Diverging bottleneck prediction per tool}\label{table:bottleneck_diverging_pred}
|
|
||||||
\begin{tabular}{l r r r r}
|
\begin{tabular}{l r r r r}
|
||||||
\toprule
|
\toprule
|
||||||
\textbf{Tool}
|
\textbf{Tool}
|
||||||
|
@ -197,6 +200,7 @@ frontend bottlenecks, thus making it easier for them to agree.
|
||||||
\iaca{} & 1221 & (53.0 \%) & 900 & (36.6 \%) \\
|
\iaca{} & 1221 & (53.0 \%) & 900 & (36.6 \%) \\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
|
\caption{Diverging bottleneck prediction per tool}\label{table:bottleneck_diverging_pred}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The Table~\ref{table:bottleneck_diverging_pred}, in turn, breaks down the cases
|
The Table~\ref{table:bottleneck_diverging_pred}, in turn, breaks down the cases
|
||||||
|
@ -216,14 +220,13 @@ tool for each kind of bottleneck.
|
||||||
|
|
||||||
\subsection{Impact of dependency-boundness}\label{ssec:memlatbound}
|
\subsection{Impact of dependency-boundness}\label{ssec:memlatbound}
|
||||||
|
|
||||||
\begin{table*}
|
\begin{table}
|
||||||
\centering
|
\centering
|
||||||
\caption{Statistical analysis of overall results, without latency bound
|
\footnotesize
|
||||||
through memory-carried dependencies rows}\label{table:nomemdeps_stats}
|
|
||||||
\begin{tabular}{l r r r r r r r r r}
|
\begin{tabular}{l r r r r r r r r r}
|
||||||
\toprule
|
\toprule
|
||||||
\textbf{Bencher} & \textbf{Datapoints} & \textbf{Failures} & \textbf{(\%)} &
|
\textbf{Bencher} & \textbf{Datapoints} & \textbf{Failures} & \textbf{(\%)} &
|
||||||
\textbf{MAPE} & \textbf{Median} & \textbf{Q1} & \textbf{Q3} & \textbf{K. tau}\\
|
\textbf{MAPE} & \textbf{Median} & \textbf{Q1} & \textbf{Q3} & \textbf{$K_\tau$}\\
|
||||||
\midrule
|
\midrule
|
||||||
BHive & 1365 & 1023 & (42.84\,\%) & 34.07\,\% & 8.62\,\% & 4.30\,\% & 24.25\,\% & 0.76\\
|
BHive & 1365 & 1023 & (42.84\,\%) & 34.07\,\% & 8.62\,\% & 4.30\,\% & 24.25\,\% & 0.76\\
|
||||||
llvm-mca & 2388 & 0 & (0.00\,\%) & 27.06\,\% & 21.04\,\% & 9.69\,\% & 32.73\,\% & 0.79\\
|
llvm-mca & 2388 & 0 & (0.00\,\%) & 27.06\,\% & 21.04\,\% & 9.69\,\% & 32.73\,\% & 0.79\\
|
||||||
|
@ -233,7 +236,9 @@ Iaca & 2388 & 0 & (0.00\,\%) & 17.55\,\% & 12.17\,\% & 4.64\,\% & 22.35\,\% & 0.
|
||||||
Gus & 2388 & 0 & (0.00\,\%) & 23.18\,\% & 20.23\,\% & 8.78\,\% & 32.73\,\% & 0.83\\
|
Gus & 2388 & 0 & (0.00\,\%) & 23.18\,\% & 20.23\,\% & 8.78\,\% & 32.73\,\% & 0.83\\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table*}
|
\caption{Statistical analysis of overall results, without latency bound
|
||||||
|
through memory-carried dependencies rows}\label{table:nomemdeps_stats}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
An overview of the full results table (available in our artifact) hints towards
|
An overview of the full results table (available in our artifact) hints towards
|
||||||
two main tendencies: on a significant number of rows, the static tools
|
two main tendencies: on a significant number of rows, the static tools
|
||||||
|
@ -256,7 +261,8 @@ against 129 (14.8\,\%) for \texttt{default} and 61 (7.0\,\%) for
|
||||||
investigate the issue.
|
investigate the issue.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\includegraphics[width=\linewidth]{figs/nomemdeps_boxplot.pdf}
|
\centering
|
||||||
|
\includegraphics[width=0.5\linewidth]{nomemdeps_boxplot.pdf}
|
||||||
\caption{Statistical distribution of relative errors, with and without
|
\caption{Statistical distribution of relative errors, with and without
|
||||||
pruning latency bound through memory-carried dependencies rows}\label{fig:nomemdeps_boxplot}
|
pruning latency bound through memory-carried dependencies rows}\label{fig:nomemdeps_boxplot}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
|
@ -3,6 +3,7 @@
|
||||||
\input{00_intro.tex}
|
\input{00_intro.tex}
|
||||||
\input{05_related_works.tex}
|
\input{05_related_works.tex}
|
||||||
\input{10_bench_gen.tex}
|
\input{10_bench_gen.tex}
|
||||||
|
\input{15_harness.tex}
|
||||||
\input{20_evaluation.tex}
|
\input{20_evaluation.tex}
|
||||||
\input{25_results_analysis.tex}
|
\input{25_results_analysis.tex}
|
||||||
\input{30_future_works.tex}
|
\input{30_future_works.tex}
|
||||||
|
|
|
@ -29,7 +29,7 @@
|
||||||
\node[rednode] (ppapi) [left=1cm of bhive] {perf (measure)};
|
\node[rednode] (ppapi) [left=1cm of bhive] {perf (measure)};
|
||||||
\node[rednode] (gus) [below=0.5em of ppapi] {Gus};
|
\node[rednode] (gus) [below=0.5em of ppapi] {Gus};
|
||||||
%% \node[rednode] (uica) [below=of gdb] {uiCA};
|
%% \node[rednode] (uica) [below=of gdb] {uiCA};
|
||||||
\node[rednode] (lifting) [right=of bhive] {
|
\node[rednode] (lifting) [below right=1em and 0.2cm of gdb] {
|
||||||
Prediction lifting\\\figref{ssec:harness_lifting}};
|
Prediction lifting\\\figref{ssec:harness_lifting}};
|
||||||
\node[
|
\node[
|
||||||
draw=black,
|
draw=black,
|
||||||
|
@ -47,15 +47,15 @@
|
||||||
label={[above,xshift=1cm]\footnotesize Variations},
|
label={[above,xshift=1cm]\footnotesize Variations},
|
||||||
fit=(pocc) (kernel) (gcc)
|
fit=(pocc) (kernel) (gcc)
|
||||||
] (vars) {};
|
] (vars) {};
|
||||||
\node[resultnode] (bench2) [below=of lifting] {Evaluation metrics \\ for
|
\node[resultnode] (bench2) [right=of lifting] {Evaluation metrics \\ for
|
||||||
code analyzers};
|
code analyzers};
|
||||||
|
|
||||||
% Key
|
% Key
|
||||||
\node[] (keyblue1) [below left=0.7cm and 0cm of vars] {};
|
\node[] (keyblue1) [below left=0.7cm and 0cm of vars] {};
|
||||||
\node[hiddennode] (keyblue2) [right=0.5cm of keyblue1] {Section~\ref{sec:bench_gen}~: generating microbenchmarks};
|
\node[hiddennode] (keyblue2) [right=0.5cm of keyblue1] {Section~\ref{sec:bench_gen}~: generating microbenchmarks};
|
||||||
\node[] (keyred1) [right=0.6cm of keyblue2] {};
|
\node[] (keyred1) [below=.5em of keyblue1] {};
|
||||||
\node[hiddennode] (keyred2) [right=0.5cm of keyred1] {Section~\ref{sec:bench_harness}~: benchmarking harness};
|
\node[hiddennode] (keyred2) [right=0.5cm of keyred1] {Section~\ref{sec:bench_harness}~: benchmarking harness};
|
||||||
\node[] (keyresult1) [right=0.6cm of keyred2] {};
|
\node[] (keyresult1) [below=.5em of keyred1] {};
|
||||||
\node[hiddennode] (keyresult2) [right=0.5cm of keyresult1]
|
\node[hiddennode] (keyresult2) [right=0.5cm of keyresult1]
|
||||||
{Section~\ref{sec:results_analysis}~: results analysis};
|
{Section~\ref{sec:results_analysis}~: results analysis};
|
||||||
|
|
||||||
|
@ -74,8 +74,8 @@
|
||||||
\draw[->, very thick, harnarrow] (gdb.east) -- (ithemal.west);
|
\draw[->, very thick, harnarrow] (gdb.east) -- (ithemal.west);
|
||||||
\draw[->, very thick, harnarrow] (gdb.east) -- (bhive.west);
|
\draw[->, very thick, harnarrow] (gdb.east) -- (bhive.west);
|
||||||
\draw[->, very thick, harnarrow] (gdb.east) -- (llvm.west);
|
\draw[->, very thick, harnarrow] (gdb.east) -- (llvm.west);
|
||||||
\draw[->, very thick, harnarrow] (comps.east|-lifting) -- (lifting.west);
|
\draw[->, very thick, harnarrow] (comps.south-|lifting) -- (lifting.north);
|
||||||
\draw[->, very thick] (lifting.south) -- (bench2.north);
|
\draw[->, very thick] (lifting.east) -- (bench2.west);
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
}
|
}
|
||||||
\caption{Our analysis and measurement environment.\label{fig:contrib}}
|
\caption{Our analysis and measurement environment.\label{fig:contrib}}
|
||||||
|
|
|
@ -2,6 +2,8 @@
|
||||||
\newcommand{\uops}{\uop{}s}
|
\newcommand{\uops}{\uop{}s}
|
||||||
|
|
||||||
\newcommand{\eg}{\textit{eg.}}
|
\newcommand{\eg}{\textit{eg.}}
|
||||||
|
\newcommand{\ie}{\textit{ie.}}
|
||||||
|
\newcommand{\wrt}{\textit{wrt.}}
|
||||||
|
|
||||||
\newcommand{\kerK}{\mathcal{K}}
|
\newcommand{\kerK}{\mathcal{K}}
|
||||||
\newcommand{\calR}{\mathcal{R}}
|
\newcommand{\calR}{\mathcal{R}}
|
||||||
|
@ -36,6 +38,20 @@
|
||||||
\newcommand{\pipedream}{\texttt{Pipedream}}
|
\newcommand{\pipedream}{\texttt{Pipedream}}
|
||||||
\newcommand{\palmed}{\texttt{Palmed}}
|
\newcommand{\palmed}{\texttt{Palmed}}
|
||||||
\newcommand{\pmevo}{\texttt{PMEvo}}
|
\newcommand{\pmevo}{\texttt{PMEvo}}
|
||||||
|
\newcommand{\gus}{\texttt{Gus}}
|
||||||
|
\newcommand{\ithemal}{\texttt{Ithemal}}
|
||||||
|
\newcommand{\osaca}{\texttt{Osaca}}
|
||||||
|
\newcommand{\bhive}{\texttt{BHive}}
|
||||||
|
\newcommand{\anica}{\texttt{AnICA}}
|
||||||
|
\newcommand{\cesasme}{\texttt{CesASMe}}
|
||||||
|
|
||||||
|
\newcommand{\gdb}{\texttt{gdb}}
|
||||||
|
|
||||||
|
\newcommand{\coeq}{CO$_{2}$eq}
|
||||||
|
|
||||||
|
\newcommand{\figref}[1]{[\ref{#1}]}
|
||||||
|
|
||||||
|
\newcommand{\reg}[1]{\texttt{\%#1}}
|
||||||
|
|
||||||
% Hyperlinks
|
% Hyperlinks
|
||||||
\newcommand{\pymodule}[1]{\href{https://docs.python.org/3/library/#1.html}{\lstpython{#1}}}
|
\newcommand{\pymodule}[1]{\href{https://docs.python.org/3/library/#1.html}{\lstpython{#1}}}
|
||||||
|
|
|
@ -25,9 +25,13 @@
|
||||||
\usepackage{import}
|
\usepackage{import}
|
||||||
\usepackage{wrapfig}
|
\usepackage{wrapfig}
|
||||||
\usepackage{float}
|
\usepackage{float}
|
||||||
|
\usepackage{tikz}
|
||||||
\usepackage[bottom]{footmisc} % footnotes are below floats
|
\usepackage[bottom]{footmisc} % footnotes are below floats
|
||||||
\usepackage[final]{microtype}
|
\usepackage[final]{microtype}
|
||||||
|
|
||||||
|
\usetikzlibrary{positioning}
|
||||||
|
\usetikzlibrary{fit}
|
||||||
|
|
||||||
\emergencystretch=1em
|
\emergencystretch=1em
|
||||||
|
|
||||||
% Local sty files
|
% Local sty files
|
||||||
|
|
Loading…
Add table
Reference in a new issue