Parametric frontend: first writeup
This commit is contained in:
parent
e717475763
commit
ff7157993d
2 changed files with 29 additions and 4 deletions
|
@ -91,8 +91,8 @@ cross an instruction cache line boundary~\cite{uica}.
|
||||||
|
|
||||||
Processors implementing ISAs subject to decoding bottleneck typically also
|
Processors implementing ISAs subject to decoding bottleneck typically also
|
||||||
feature a decoded \uop{} cache. The typical hit rate of this cache is about
|
feature a decoded \uop{} cache. The typical hit rate of this cache is about
|
||||||
80\%~\cite[Section
|
80\%~\cites[Section
|
||||||
B.5.7.2]{ref:intel64_software_dev_reference_vol1}\cite{dead_uops}. However,
|
B.5.7.2]{ref:intel64_software_dev_reference_vol1}{dead_uops}. However,
|
||||||
code analyzers are concerned with loops and, more generally, hot code portions.
|
code analyzers are concerned with loops and, more generally, hot code portions.
|
||||||
Under such conditions, we expect this cache, once hot in steady-state, to be
|
Under such conditions, we expect this cache, once hot in steady-state, to be
|
||||||
very close to a 100\% hit rate. In this case, only the dispatch throughput will
|
very close to a 100\% hit rate. In this case, only the dispatch throughput will
|
||||||
|
@ -114,8 +114,17 @@ be investigated if the model does not reach the expected accuracy.
|
||||||
necessity to hit a cache. We are unaware of
|
necessity to hit a cache. We are unaware of
|
||||||
other architectures with such a feature.
|
other architectures with such a feature.
|
||||||
|
|
||||||
\item{} macro-ops \todo{}
|
\item{} In reality, there is an intermediary step between instructions and
|
||||||
|
\uops{}: macro-ops. Although it serves a designing and semantic
|
||||||
|
purpose, we omit this step in the current model as --~we
|
||||||
|
believe~-- it is of little importance to predict performance.
|
||||||
|
|
||||||
\item{} fusion, lamination \todo{}
|
\item{} On x86 architectures at least, common pairs of micro- or
|
||||||
|
macro-operations may be ``fused'' into a single one, up to various
|
||||||
|
parts of the pipeline, to save space in some queues or artificially
|
||||||
|
boost dispatch limitations. This mechanism is implemented in Intel
|
||||||
|
architectures, and to some extent in AMD architectures since
|
||||||
|
Zen~\cites[§3.4.2]{ref:intel64_architectures_optim_reference_vol1}{uica}{Vishnekov_2021}.
|
||||||
|
This may make some kernels seem to ``bypass'' dispatch limits.
|
||||||
|
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
|
@ -214,3 +214,19 @@
|
||||||
pages={361-374},
|
pages={361-374},
|
||||||
keywords={Program processors;Microarchitecture;Computer architecture;Timing;System-on-chip;Transient analysis},
|
keywords={Program processors;Microarchitecture;Computer architecture;Timing;System-on-chip;Transient analysis},
|
||||||
doi={10.1109/ISCA52012.2021.00036}}
|
doi={10.1109/ISCA52012.2021.00036}}
|
||||||
|
|
||||||
|
@article{Vishnekov_2021,
|
||||||
|
doi = {10.1088/1742-6596/1740/1/012053},
|
||||||
|
url = {https://dx.doi.org/10.1088/1742-6596/1740/1/012053},
|
||||||
|
year = {2021},
|
||||||
|
month = {jan},
|
||||||
|
publisher = {IOP Publishing},
|
||||||
|
volume = {1740},
|
||||||
|
number = {1},
|
||||||
|
pages = {012053},
|
||||||
|
author = {A V Vishnekov and E M Ivanova and N A Stepanov and N D Shaimov},
|
||||||
|
title = {A Simulation Model for Macro- and Micro-Fusion Algorithms in the CPU Core},
|
||||||
|
journal = {Journal of Physics: Conference Series},
|
||||||
|
abstract = {The article discusses the features of modern processor’s microarchitecture, the method of instruction’s and micro-operation’s accelerated execution. The research focuses on the organization of the decoding stage in the CPU core pipeline and Macro- and Micro-fusion algorithms. The Macro- and Micro-fusion mechanisms are defined. A computer simulator has been developed to explore these mechanisms. The developed software has a user-friendly interface, is easy to use, and combines training and research options. The computer simulator demonstrates the sequence of mechanism’ s implementation; the resulting macro-or microoperations set after Macro- and Micro-fusion, and also reflects each algorithm features for different processor’s families. The software allows you to use either a pre-prepared file with Assembler (x86) code fragments as source data, or enter/change the source code fragments at your request. The main combinations of machine instructions that can be fused into a single macro-operation are considered, as well as instructions that can be decoded into fused micro-operations. The simulator can be useful both for in Computer Science & Engineering students, especially for on-line education and for researchers and General-purpose CPU cores developers.}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue