Parametric frontend: first writeup

This commit is contained in:
Théophile Bastian 2024-06-18 09:50:28 +02:00
parent e717475763
commit ff7157993d
2 changed files with 29 additions and 4 deletions

View file

@ -91,8 +91,8 @@ cross an instruction cache line boundary~\cite{uica}.
Processors implementing ISAs subject to decoding bottleneck typically also Processors implementing ISAs subject to decoding bottleneck typically also
feature a decoded \uop{} cache. The typical hit rate of this cache is about feature a decoded \uop{} cache. The typical hit rate of this cache is about
80\%~\cite[Section 80\%~\cites[Section
B.5.7.2]{ref:intel64_software_dev_reference_vol1}\cite{dead_uops}. However, B.5.7.2]{ref:intel64_software_dev_reference_vol1}{dead_uops}. However,
code analyzers are concerned with loops and, more generally, hot code portions. code analyzers are concerned with loops and, more generally, hot code portions.
Under such conditions, we expect this cache, once hot in steady-state, to be Under such conditions, we expect this cache, once hot in steady-state, to be
very close to a 100\% hit rate. In this case, only the dispatch throughput will very close to a 100\% hit rate. In this case, only the dispatch throughput will
@ -114,8 +114,17 @@ be investigated if the model does not reach the expected accuracy.
necessity to hit a cache. We are unaware of necessity to hit a cache. We are unaware of
other architectures with such a feature. other architectures with such a feature.
\item{} macro-ops \todo{} \item{} In reality, there is an intermediary step between instructions and
\uops{}: macro-ops. Although it serves a designing and semantic
purpose, we omit this step in the current model as --~we
believe~-- it is of little importance to predict performance.
\item{} fusion, lamination \todo{} \item{} On x86 architectures at least, common pairs of micro- or
macro-operations may be ``fused'' into a single one, up to various
parts of the pipeline, to save space in some queues or artificially
boost dispatch limitations. This mechanism is implemented in Intel
architectures, and to some extent in AMD architectures since
Zen~\cites[§3.4.2]{ref:intel64_architectures_optim_reference_vol1}{uica}{Vishnekov_2021}.
This may make some kernels seem to ``bypass'' dispatch limits.
\end{itemize} \end{itemize}

View file

@ -214,3 +214,19 @@
pages={361-374}, pages={361-374},
keywords={Program processors;Microarchitecture;Computer architecture;Timing;System-on-chip;Transient analysis}, keywords={Program processors;Microarchitecture;Computer architecture;Timing;System-on-chip;Transient analysis},
doi={10.1109/ISCA52012.2021.00036}} doi={10.1109/ISCA52012.2021.00036}}
@article{Vishnekov_2021,
doi = {10.1088/1742-6596/1740/1/012053},
url = {https://dx.doi.org/10.1088/1742-6596/1740/1/012053},
year = {2021},
month = {jan},
publisher = {IOP Publishing},
volume = {1740},
number = {1},
pages = {012053},
author = {A V Vishnekov and E M Ivanova and N A Stepanov and N D Shaimov},
title = {A Simulation Model for Macro- and Micro-Fusion Algorithms in the CPU Core},
journal = {Journal of Physics: Conference Series},
abstract = {The article discusses the features of modern processors microarchitecture, the method of instructions and micro-operations accelerated execution. The research focuses on the organization of the decoding stage in the CPU core pipeline and Macro- and Micro-fusion algorithms. The Macro- and Micro-fusion mechanisms are defined. A computer simulator has been developed to explore these mechanisms. The developed software has a user-friendly interface, is easy to use, and combines training and research options. The computer simulator demonstrates the sequence of mechanism s implementation; the resulting macro-or microoperations set after Macro- and Micro-fusion, and also reflects each algorithm features for different processors families. The software allows you to use either a pre-prepared file with Assembler (x86) code fragments as source data, or enter/change the source code fragments at your request. The main combinations of machine instructions that can be fused into a single macro-operation are considered, as well as instructions that can be decoded into fused micro-operations. The simulator can be useful both for in Computer Science & Engineering students, especially for on-line education and for researchers and General-purpose CPU cores developers.}
}