A72: start writeup
This commit is contained in:
parent
f3fbbf3b6b
commit
41bb653013
5 changed files with 358 additions and 121 deletions
|
@ -1,3 +1,103 @@
|
|||
The usual reverse-engineering methods for CPU models usually make abundant use
|
||||
of hardware counters ---~and legitimately so, as they are the natural and
|
||||
accurate way to obtain insight on the internals of a CPU\@. Such methods
|
||||
include, among others, the optimisation guides from Agner Fog~\cite{AgnerFog},
|
||||
as well as \uopsinfo{}~\cite{uopsinfo} and \uica{}'s~\cite{uica} approach to
|
||||
respectively model the CPU's back- and front-end. In \autoref{chap:palmed}, we
|
||||
introduced Palmed, whose main goal is to automatically produce port-mappings of
|
||||
CPUs without assuming the presence of specific hardware counters.
|
||||
|
||||
\smallskip{}
|
||||
|
||||
The ARM architectures occupy a growing space in the global computing ecosystem.
|
||||
They are already pervasive among the embedded and mobile devices, with most
|
||||
mobile phones featuring an ARM CPU~\cite{arm_mobile}. Processors based on ARM
|
||||
are emerging in datacenters and supercomputers: the Fugaku supercomputer
|
||||
---~considered the fastest supercomputer in the world by the TOP500
|
||||
ranking~\cite{fugaku_top500}~--- runs on ARM-based CPUs~\cite{fugaku_arm}, the
|
||||
MareNostrum 4 supercomputer has an ARM-based cluster~\cite{marenostrum4_arm}.
|
||||
|
||||
Yet, the ARM ecosystem is still lacking in performance debugging tooling. While
|
||||
\llvmmca{} supports ARM, it is one of the only few: \iaca{}, made by Intel, is
|
||||
not supported ---~and will never be, as it is end-of-life~---; \uica{} is
|
||||
focused on Intel architectures, and cannot be easily ported as it heavily
|
||||
relies on reverse engineering specific to Intel, and enabled by specific
|
||||
hardware counters; Intel \texttt{VTune}, a commonly used profiling performance
|
||||
analysis tool, supports only x86-64.
|
||||
|
||||
\smallskip{}
|
||||
|
||||
In this context, modelling an ARM CPU ---~the Cortex A72~--- with Palmed seemed
|
||||
to be an important goal, especially meaningful as this particular CPU only has
|
||||
very few hardware counters. However, it yielded only mixed results, as shown in
|
||||
\autoref{sec:palmed_results}.
|
||||
|
||||
In this chapter, we show that a major cause of imprecision in these results is
|
||||
the absence of a frontend model.
|
||||
|
||||
|
||||
\section{Necessity to go beyond ports}
|
||||
|
||||
The resource models produced by \palmed{} are mainly concerned with the backend
|
||||
of the CPUs modeled. However, the importance of the frontend in the accuracy of
|
||||
a model's prediction cannot be ignored. Its effect can be clearly seen in the
|
||||
evaluation heatmaps of various code analyzers in \autoref{fig:palmed_heatmaps}.
|
||||
Each heatmap has a clear-cut limit on the horizontal axis: independently of the
|
||||
benchmark's content, it is impossible to reach more than a given number of
|
||||
instructions per cycle for a given processor ---~4 instructions for the
|
||||
\texttt{SKL-SP}, 5 for the \texttt{ZEN1}. This limit is imposed by the
|
||||
frontend.
|
||||
|
||||
Some analyzers, such as \palmed{} and \iaca{}, model this limit: the heatmap
|
||||
shows that the predicted IPC will not surpass this limit. The other three
|
||||
analyzers studied, however, do not model this limit; for instance, \uopsinfo{}
|
||||
has a high density of benchmarks predicted at 8 instructions per cycle on
|
||||
SPEC2017 on the \texttt{SKL-SP} CPU, while the native measurement yielded only
|
||||
4 instructions per cycle. The same effect is visible on \pmevo{} and \llvmmca{}
|
||||
heatmaps.
|
||||
|
||||
\begin{example}{High back-end throughput on \texttt{SKL-SP}}
|
||||
On the \texttt{SKL-SP} microarchitecture, assuming an infinitely large
|
||||
frontend, a number of instructions per cycle higher than 4 is easy to
|
||||
reach.
|
||||
|
||||
According to \uopsinfo{} data, a 64-bits
|
||||
integer \lstxasm{addq} is processed with a single \uop{}, dispatched on
|
||||
port 0, 1, 5 or 6. In the meantime, a simple form 64 bits register store
|
||||
to a direct register-held address ---~\eg{} a \lstxasm{movq \%rax,
|
||||
(\%rbx)}~--- is also processed with a single \uop{}, dispatched on port 2
|
||||
or 3.
|
||||
|
||||
Thus, backend-wise, the kernel $4\times \texttt{addq} + 2\times
|
||||
\texttt{mov}$ has a throughput of 6 instructions per cycle. However, in
|
||||
reality, this kernel would be frontend-bound, with a theoretical maximum throughput of 4
|
||||
instructions per cycle ---~in fact, a \pipedream{} measure only yields 3
|
||||
instructions per cycle.
|
||||
\end{example}
|
||||
|
||||
\bigskip{}
|
||||
|
||||
To account for this, \palmed{} tries to detect an additional resource, apart
|
||||
from the backend ports and combined ports, on which every \uop{} incurs a load.
|
||||
This allows \palmed{} to avoid large errors on frontend-bound kernels.
|
||||
|
||||
The approach is, however, far from perfect. The clearest reason for this is is
|
||||
that the frontend, both on x86-64 and ARM architectures, works in-order, while
|
||||
\palmed{} inherently models kernels as multisets of instructions, thus
|
||||
completely ignoring ordering. This resource model is purely linear: an
|
||||
instruction incurs a load on the frontend resource in a fully commutative way,
|
||||
independently of the previous instructions executed this cycle and of many
|
||||
other effects.
|
||||
|
||||
The article introducing \uica{}~\cite{uica} explores this question in detail
|
||||
for x86-64 Intel architectures. The authors, having previously developed
|
||||
\uopsinfo{}, discuss the importance of a correct modelling of the frontend to
|
||||
accurately predict throughput. Their approach, based on the exploration and
|
||||
reverse-engineering of the crucial parts of the frontend, showcases many
|
||||
important and non-trivial aspects of frontends usually neglected, such as the
|
||||
switching between the decoders and \uop{}-cache as source of instructions
|
||||
---~which cannot be linearly modelled.
|
||||
|
||||
\section{The Cortex A72 CPU}
|
||||
|
||||
The Cortex A72~\cite{a72_doc} is a CPU based on the ARMv8-A ISA ---~the first
|
||||
|
@ -8,6 +108,62 @@ high-performance core for low-power applications.
|
|||
The Raspberry Pi 4 uses a 4-cores A72 CPU, implemented by Broadcom as BCM2711;
|
||||
it is thus easy to have access to an A72 to run experiments.
|
||||
|
||||
\paragraph{Backend.}
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=\linewidth]{A72_pipeline_diagram.svg}
|
||||
\caption{Simplified overview of the Cortex A72
|
||||
pipeline}\label{fig:a72_pipeline}
|
||||
\end{figure}
|
||||
|
||||
\paragraph{Backend.} As can be seen in \autoref{fig:a72_pipeline} (adapted from
|
||||
the software optimization guide for the Cortex A72, published by
|
||||
ARM~\cite{ref:a72_optim}), the Cortex A72 has eight execution ports:
|
||||
\begin{itemize}
|
||||
\item a branch port (branch instructions, equivalent to x86 jumps);
|
||||
\item two identical integer ports (integer arithmetic operation);
|
||||
\item an integer multi-cycle port (complex integer operations, \eg{} divisions);
|
||||
\item two nearly-identical floating point and SIMD ports (mostly identical,
|
||||
with slight specializations: \eg{} only port FP0 can do SIMD
|
||||
multiplication, while only port FP1 can do floating point comparisons);
|
||||
\item a load port;
|
||||
\item a store port.
|
||||
\end{itemize}
|
||||
|
||||
\paragraph{Frontend.} The Cortex A72 frontend can only decode three
|
||||
instructions and dispatch three \uops{} per cycle~\cite{ref:a72_optim}.
|
||||
Intel's \texttt{SKL-SP}, which we considered before, has a frontend that
|
||||
bottlenecks at four \uops{} per cycle~\cite{agnerfog_skl_front4}. This
|
||||
difference of one \uop{} per cycle is actually meaningful, as this means that
|
||||
only three of the eight backend ports can be used each cycle.
|
||||
|
||||
\begin{example}[2nd order polynomial evaluation]
|
||||
Consider a kernel evaluating the 2nd order polynomial expression for
|
||||
different values of $x$:
|
||||
\begin{align*}
|
||||
P[i] &= a{X[i]}^2 + bX[i] + c \\
|
||||
&= \left( aX[i] + b \right) \times X[i] + c
|
||||
\end{align*}
|
||||
which directly translates to four operations: load $X[i]$, two floating
|
||||
point multiply-add, store the result $P[i]$. The backend, having a load
|
||||
port, two SIMD ports and a store port, can execute one iteration of such a
|
||||
kernel every cycle; in steady-state, out-of-order execution can lift the
|
||||
latency-induced pressure. However, as the frontend bottlenecks at three \uops{}
|
||||
per cycle, this kernel does not fit in a single cycle.
|
||||
\end{example}
|
||||
|
||||
\paragraph{Lack of hardware counters.}
|
||||
The Cortex A72 only features a very limited set of specialized hardware counters.
|
||||
While the CPU is able to report the number of elapsed cycles,
|
||||
retired instructions, branch misses and various metrics on cache misses, it
|
||||
does not report any event regarding macro- or micro-operations, dispatching or
|
||||
issuing to specific ports. This makes it, as pointed before, a particularly
|
||||
relevant target for \palmed{}.
|
||||
|
||||
|
||||
\section{Manually modelling the A72 frontend}
|
||||
|
||||
% TODO
|
||||
|
||||
\subsection{Methodology}
|
||||
|
||||
|
||||
\cite{ref:a72_optim}
|
||||
|
|
|
@ -2,9 +2,9 @@
|
|||
<!-- Created with Inkscape (http://www.inkscape.org/) -->
|
||||
|
||||
<svg
|
||||
width="205mm"
|
||||
height="120.10254mm"
|
||||
viewBox="0 0 205 120.10254"
|
||||
width="240mm"
|
||||
height="110.30828mm"
|
||||
viewBox="0 0 240 110.30828"
|
||||
version="1.1"
|
||||
id="svg1"
|
||||
sodipodi:docname="A72_pipeline_diagram.svg"
|
||||
|
@ -15,7 +15,7 @@
|
|||
xmlns:svg="http://www.w3.org/2000/svg">
|
||||
<sodipodi:namedview
|
||||
id="namedview1"
|
||||
pagecolor="#505050"
|
||||
pagecolor="#616161"
|
||||
bordercolor="#eeeeee"
|
||||
borderopacity="1"
|
||||
inkscape:showpageshadow="0"
|
||||
|
@ -23,9 +23,9 @@
|
|||
inkscape:pagecheckerboard="false"
|
||||
inkscape:deskcolor="#d1d1d1"
|
||||
inkscape:document-units="mm"
|
||||
inkscape:zoom="1.4142136"
|
||||
inkscape:cx="564.9783"
|
||||
inkscape:cy="215.31401"
|
||||
inkscape:zoom="0.76905556"
|
||||
inkscape:cx="309.47049"
|
||||
inkscape:cy="429.74789"
|
||||
inkscape:window-width="1916"
|
||||
inkscape:window-height="1161"
|
||||
inkscape:window-x="0"
|
||||
|
@ -240,63 +240,63 @@
|
|||
inkscape:label="Layer 1"
|
||||
inkscape:groupmode="layer"
|
||||
id="layer1"
|
||||
transform="translate(-2.1624756e-8,-0.30828345)">
|
||||
transform="translate(-2.1624755e-8,-5)">
|
||||
<g
|
||||
id="g26"
|
||||
transform="translate(-6.7096648,12.738239)">
|
||||
id="g4">
|
||||
<rect
|
||||
style="fill:#8bcbed;fill-opacity:1;stroke:none;stroke-width:0;stroke-dasharray:none"
|
||||
id="rect1"
|
||||
width="40"
|
||||
width="50"
|
||||
height="60"
|
||||
x="6.7096648"
|
||||
y="9.7617607" />
|
||||
x="2.1624757e-08"
|
||||
y="22.5" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="17.097843"
|
||||
y="42.392094"
|
||||
x="15.388178"
|
||||
y="55.130333"
|
||||
id="text2"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan2"
|
||||
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:7.05556px;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;font-variant-caps:normal;font-variant-numeric:normal;font-variant-east-asian:normal;stroke-width:0.264583"
|
||||
x="17.097843"
|
||||
y="42.392094">Fetch</tspan></text>
|
||||
x="15.388178"
|
||||
y="55.130333">Fetch</tspan></text>
|
||||
</g>
|
||||
<g
|
||||
id="g27"
|
||||
transform="translate(-1.0712547,12.580196)">
|
||||
id="g5"
|
||||
transform="translate(0.06535216)">
|
||||
<rect
|
||||
style="fill:#8bcbed;fill-opacity:1;stroke:none;stroke-width:0;stroke-dasharray:none"
|
||||
id="rect1-6"
|
||||
width="40"
|
||||
width="50"
|
||||
height="60"
|
||||
x="51.071255"
|
||||
y="9.9198036" />
|
||||
x="59.927437"
|
||||
y="22.5" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="55.480465"
|
||||
y="33.046833"
|
||||
x="69.336647"
|
||||
y="45.627029"
|
||||
id="text3"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan3"
|
||||
style="stroke-width:0.264583"
|
||||
x="55.480465"
|
||||
y="33.046833">Decode,</tspan><tspan
|
||||
x="69.336647"
|
||||
y="45.627029">Decode,</tspan><tspan
|
||||
sodipodi:role="line"
|
||||
style="stroke-width:0.264583"
|
||||
x="55.480465"
|
||||
y="41.866283"
|
||||
x="69.336647"
|
||||
y="54.44648"
|
||||
id="tspan4">Rename,</tspan><tspan
|
||||
sodipodi:role="line"
|
||||
style="stroke-width:0.264583"
|
||||
x="55.480465"
|
||||
y="50.685734"
|
||||
x="69.336647"
|
||||
y="63.26593"
|
||||
id="tspan6">Dispatch</tspan></text>
|
||||
</g>
|
||||
<g
|
||||
id="g25">
|
||||
id="g25"
|
||||
transform="translate(35)">
|
||||
<rect
|
||||
style="fill:#8bcbed;fill-opacity:1;stroke:none;stroke-width:0;stroke-dasharray:none"
|
||||
id="rect7"
|
||||
|
@ -319,7 +319,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g17"
|
||||
transform="translate(-0.04806468)">
|
||||
transform="translate(34.951935)">
|
||||
<rect
|
||||
style="fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-dasharray:none"
|
||||
id="rect8"
|
||||
|
@ -342,7 +342,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g18"
|
||||
transform="translate(-0.05012462)">
|
||||
transform="translate(34.949875)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;opacity:1;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000;stop-opacity:1"
|
||||
id="rect8-1"
|
||||
|
@ -364,7 +364,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g19"
|
||||
transform="translate(-0.05012462)">
|
||||
transform="translate(34.949875)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;opacity:1;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000;stop-opacity:1"
|
||||
id="rect8-10"
|
||||
|
@ -386,7 +386,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g20"
|
||||
transform="translate(-0.04998729)">
|
||||
transform="translate(34.950013)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;opacity:1;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000;stop-opacity:1"
|
||||
id="rect8-7"
|
||||
|
@ -408,7 +408,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g21"
|
||||
transform="translate(-0.04804942)">
|
||||
transform="translate(34.951951)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;opacity:1;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000;stop-opacity:1"
|
||||
id="rect8-6"
|
||||
|
@ -430,7 +430,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g22"
|
||||
transform="translate(-0.04804942)">
|
||||
transform="translate(34.951951)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000"
|
||||
id="rect8-6-7"
|
||||
|
@ -452,7 +452,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g23"
|
||||
transform="translate(-0.04804942)">
|
||||
transform="translate(34.951951)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000"
|
||||
id="rect8-6-6"
|
||||
|
@ -474,7 +474,7 @@
|
|||
</g>
|
||||
<g
|
||||
id="g24"
|
||||
transform="translate(-0.04804942)">
|
||||
transform="translate(34.951951)">
|
||||
<rect
|
||||
style="font-variation-settings:normal;vector-effect:none;fill:#e1ad9c;fill-opacity:1;stroke:#000000;stroke-width:0;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;-inkscape-stroke:none;stop-color:#000000"
|
||||
id="rect8-6-0"
|
||||
|
@ -496,92 +496,98 @@
|
|||
</g>
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide)"
|
||||
d="m 40.072562,52.5 h 9.028523"
|
||||
d="m 50,52.5 h 9.028523"
|
||||
id="path27" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-1)"
|
||||
d="m 90,52.5 h 9.028523"
|
||||
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-1)"
|
||||
d="m 110,52.5 h 24.06932"
|
||||
id="path27-3" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-3)"
|
||||
d="m 120,10.04793 h 9.02853"
|
||||
d="m 155,10.04793 h 9.02853"
|
||||
id="path27-9" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-33)"
|
||||
d="m 120,22.18394 h 9.02853"
|
||||
d="m 155,22.18394 h 9.02853"
|
||||
id="path27-5" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-4)"
|
||||
d="m 120,34.31995 h 9.02853"
|
||||
d="m 155,34.31995 h 9.02853"
|
||||
id="path27-4" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-6)"
|
||||
d="m 120,46.455959 h 9.02852"
|
||||
d="m 155,46.455959 h 9.02852"
|
||||
id="path27-6" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-7)"
|
||||
d="m 120,58.591971 h 9.02852"
|
||||
d="m 155,58.591971 h 9.02852"
|
||||
id="path27-2" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-60)"
|
||||
d="m 120,70.72798 h 9.02853"
|
||||
d="m 155,70.72798 h 9.02853"
|
||||
id="path27-1" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-15)"
|
||||
d="m 120,82.863989 h 9.02852"
|
||||
d="m 155,82.863989 h 9.02852"
|
||||
id="path27-49" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1.00748;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#ArrowWide-75)"
|
||||
d="m 120,94.999999 h 9.02853"
|
||||
d="m 155,94.999999 h 9.02853"
|
||||
id="path27-48" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1"
|
||||
d="m 0.50890501,105 c 0,7.396 41.61184599,-2.02165 44.49109499,8.11964 C 47.879249,102.97835 89.491095,112.396 89.491095,105"
|
||||
id="path29"
|
||||
sodipodi:nodetypes="ccc" />
|
||||
<path
|
||||
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-dasharray:none;stroke-opacity:1"
|
||||
d="m 100.54224,105 c 0,7.1898 48.59531,-1.96529 51.95776,7.89326 3.36245,-9.85855 51.95777,-0.70346 51.95777,-7.89326"
|
||||
id="path30"
|
||||
sodipodi:nodetypes="ccc" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="30.632221"
|
||||
y="120.31091"
|
||||
id="text30"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan30"
|
||||
style="stroke-width:0.264583"
|
||||
x="30.632221"
|
||||
y="120.31091">In-order</tspan><tspan
|
||||
sodipodi:role="line"
|
||||
style="stroke-width:0.264583"
|
||||
x="30.632221"
|
||||
y="129.13036"
|
||||
id="tspan31" /></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="130.61502"
|
||||
y="120.31091"
|
||||
id="text32"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan32"
|
||||
style="stroke-width:0.264583"
|
||||
x="130.61502"
|
||||
y="120.31091">Out-of-order</tspan></text>
|
||||
<rect
|
||||
style="font-variation-settings:normal;fill:#000000;fill-opacity:0.1;stroke:none;stroke-width:0.943418;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
|
||||
id="rect35"
|
||||
width="49.932011"
|
||||
height="17.149502"
|
||||
x="2.7087286"
|
||||
y="0.30828345" />
|
||||
<g
|
||||
id="g36">
|
||||
id="g8">
|
||||
<rect
|
||||
style="fill:#e788e7;fill-opacity:1;stroke-width:1.53098"
|
||||
id="rect2"
|
||||
width="110"
|
||||
height="10"
|
||||
x="2.1624755e-08"
|
||||
y="105.30828" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="40.632221"
|
||||
y="112.93861"
|
||||
id="text30"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan30"
|
||||
style="stroke-width:0.264583"
|
||||
x="40.632221"
|
||||
y="112.93861">In-order</tspan><tspan
|
||||
sodipodi:role="line"
|
||||
style="stroke-width:0.264583"
|
||||
x="40.632221"
|
||||
y="121.75806"
|
||||
id="tspan31" /></text>
|
||||
</g>
|
||||
<g
|
||||
id="g7"
|
||||
transform="translate(15.000002)">
|
||||
<rect
|
||||
style="fill:#e788e7;fill-opacity:1;stroke-width:1.49578"
|
||||
id="rect2-2"
|
||||
width="105"
|
||||
height="10"
|
||||
x="120"
|
||||
y="105.30828" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="150.61502"
|
||||
y="112.93861"
|
||||
id="text32"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan32"
|
||||
style="stroke-width:0.264583"
|
||||
x="150.61502"
|
||||
y="112.93861">Out-of-order</tspan></text>
|
||||
</g>
|
||||
<g
|
||||
id="g34"
|
||||
transform="translate(128.54278,-45.95288)">
|
||||
<g
|
||||
id="g34">
|
||||
id="g6"
|
||||
transform="translate(-129.5646,50.965203)">
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
|
@ -601,27 +607,44 @@
|
|||
x="5"
|
||||
y="4" />
|
||||
</g>
|
||||
<g
|
||||
id="g35">
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="17.307535"
|
||||
y="15.896161"
|
||||
id="text34"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan34"
|
||||
style="stroke-width:0.264583"
|
||||
x="17.307535"
|
||||
y="15.896161">Back-end</tspan></text>
|
||||
<rect
|
||||
style="font-variation-settings:normal;fill:#e1ad9c;fill-opacity:1;stroke:none;stroke-width:0.624481;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
|
||||
id="rect34"
|
||||
width="10"
|
||||
height="1"
|
||||
x="5"
|
||||
y="12.76559" />
|
||||
</g>
|
||||
</g>
|
||||
<g
|
||||
id="g35"
|
||||
transform="translate(56.280655,-3.7532659)">
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="17.307535"
|
||||
y="15.896161"
|
||||
id="text34"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan34"
|
||||
style="stroke-width:0.264583"
|
||||
x="17.307535"
|
||||
y="15.896161">Back-end</tspan></text>
|
||||
<rect
|
||||
style="font-variation-settings:normal;fill:#e1ad9c;fill-opacity:1;stroke:none;stroke-width:0.624481;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
|
||||
id="rect34"
|
||||
width="10"
|
||||
height="1"
|
||||
x="5"
|
||||
y="12.76559" />
|
||||
</g>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:7.05556px;line-height:1.25;font-family:'DejaVu Sans';-inkscape-font-specification:'DejaVu Sans, Normal';font-variant-ligatures:none;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;stroke-width:0.264583"
|
||||
x="122.38574"
|
||||
y="36.796917"
|
||||
id="text9"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan9"
|
||||
style="text-align:center;text-anchor:middle;stroke-width:0.264583"
|
||||
x="122.38574"
|
||||
y="36.796917">3</tspan><tspan
|
||||
sodipodi:role="line"
|
||||
style="text-align:center;text-anchor:middle;stroke-width:0.264583"
|
||||
x="122.38574"
|
||||
y="45.616367"
|
||||
id="tspan10">μOPs</tspan></text>
|
||||
</g>
|
||||
</svg>
|
||||
|
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 25 KiB |
|
@ -107,6 +107,14 @@
|
|||
doi={10.1109/PMBS49563.2019.00006}
|
||||
}
|
||||
|
||||
@online{AgnerFog,
|
||||
author = {Agner Fog},
|
||||
title = {Instruction tables: Lists of instruction latencies, through-puts and micro-operation breakdowns for Intel, {AMD} and {VIA} {CPU}s},
|
||||
publisher = {Technical University of Denmark},
|
||||
year = {2020},
|
||||
url = {http://www.agner.org/optimize/instruction_tables.pdf},
|
||||
}
|
||||
|
||||
@inproceedings{uopsinfo,
|
||||
title = {uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures},
|
||||
acmid = {3304062},
|
||||
|
|
|
@ -90,3 +90,46 @@
|
|||
year = {2015},
|
||||
month = {March},
|
||||
}
|
||||
|
||||
@misc{agnerfog_skl_front4,
|
||||
title={Discussion on blogpost},
|
||||
author={Fog, Agner},
|
||||
year=2016,
|
||||
howpublished={\url{https://www.agner.org/optimize/blog/read.php?i=581}}
|
||||
}
|
||||
|
||||
@INPROCEEDINGS{fugaku_arm,
|
||||
author={Matsuoka, Satoshi},
|
||||
booktitle={2021 Symposium on VLSI Circuits},
|
||||
title={Fugaku and A64FX: the First Exascale Supercomputer and its Innovative Arm CPU},
|
||||
year={2021},
|
||||
volume={},
|
||||
number={},
|
||||
pages={1-3},
|
||||
doi={10.23919/VLSICircuits52068.2021.9492415}
|
||||
}
|
||||
|
||||
@misc{fugaku_top500,
|
||||
title={Supercomputer Fugaku retains first place worldwide in HPCG and Graph500 rankings},
|
||||
year=2022,
|
||||
month=November,
|
||||
author={{Fujitsu Limited}},
|
||||
howpublished={\url{https://www.fujitsu.com/global/about/resources/news/press-releases/2022/1115-01.html}}
|
||||
}
|
||||
|
||||
@misc{marenostrum4_arm,
|
||||
title={Technical information on the MareNostrum 4 supercomputer's ARM cluster},
|
||||
author={{Barcelona Supercomputing Center}},
|
||||
year=2020,
|
||||
howpublished={\url{https://www.bsc.es/innovation-and-services/technical-information-cte-arm}}
|
||||
}
|
||||
|
||||
@misc{arm_mobile,
|
||||
title={Together, we are building the future of computing, on Arm},
|
||||
author={Rene Haas},
|
||||
organization = {ARM},
|
||||
year=2023,
|
||||
month=September,
|
||||
howpublished={\url{https://www.arm.com/company/news/2023/09/building-the-future-of-computing-on-arm}},
|
||||
}
|
||||
|
||||
|
|
|
@ -12,10 +12,19 @@
|
|||
* Notion of bottleneck
|
||||
[[END]]
|
||||
|
||||
* Palmed was made to produce models for architectures with limited hardware
|
||||
counters
|
||||
* ARM is an important architecture:
|
||||
* already pervasive in embedded devices
|
||||
* starts to emerge in the datacenter [Fugaku, MareNostrum 4]
|
||||
* …and lacks HW counters
|
||||
* Cf prev. chapter: Palmed results on the Cortex A72 are not that good. Why?
|
||||
|
||||
## Necessity to go beyond ports
|
||||
|
||||
* Palmed: concerned mostly with ports
|
||||
* Noticed the importance of the frontend while investigating its performances
|
||||
on x86
|
||||
* heatmap representation: uops predicts unreachably high IPCs (eg. 8 on SKX)
|
||||
* example of a frontend-bound microkernel
|
||||
* Palmed's vision of a frontend
|
||||
|
@ -50,8 +59,6 @@
|
|||
* Very few hardware counters regarding the frontend! In particular, no access
|
||||
*at all* to macro-ops. No micro-op count.
|
||||
|
||||
* Pure Palmed results
|
||||
|
||||
## Manual frontend
|
||||
|
||||
### Base methodology
|
||||
|
|
Loading…
Reference in a new issue