\section{Static dependencies detection}\label{ssec:staticdeps_detection} Depending on the type of dependencies considered, it is more or less difficult to statically detect them. \paragraph{Register-carried dependencies in straight-line code.} This case is the easiest to statically detect, and is most often supported by code analyzers ---~for instance, \llvmmca{} supports it. The same strategy that was used to dynamically find dependencies in \autoref{ssec:depsim} can still be used: a shadow register file simply keeps track of which instruction last wrote each register. \paragraph{Register-carried, loop-carried dependencies.} Loop-carried dependencies can, to some extent, be detected the same way. As the basic block is always assumed to be the body of an infinite loop, a straight-line analysis can be performed on a duplicated kernel. This strategy is \eg{} adopted by \osaca{}~\cite{osaca2} (§II.D). When dealing only with register accesses, this strategy is always sufficient: as each iteration always executes the same basic block, it is not possible for an instruction to depend on another instruction two iterations earlier or more. \paragraph{Memory-carried dependencies in straight-line code.} Memory dependencies, however, are significantly harder to tackle. While basic heuristics can handle some simple cases, in the general case two main difficulties arise: \begin{enumerate}[(i)] \item{}\label{memcarried_difficulty_alias} pointers may \emph{alias}, \ie{} point to the same address or array; for instance, if \reg{rax} points to an array, it may be that \reg{rbx} points to $\reg{rax} + 8$, making the detection of such a dependency difficult; \item{}\label{memcarried_difficulty_arith} arbitrary arithmetic operations may be performed on pointers, possibly through diverting paths: \eg{} it might be necessary to detect that $\reg{rax} + 16 << 2$ is identical to $\reg{rax} + 128 / 2$; this requires semantics for assembly instructions and tracking formal expressions across register values ---~and possibly even memory. \end{enumerate} Tracking memory-carried dependencies is, to the best of our knowledge, not done in code analyzers, as our results in \autoref{chap:CesASMe} suggests. \paragraph{Loop-carried, memory-carried dependencies.} While the strategy previously used for register-carried dependencies is sufficient to detect loop-carried dependencies from one occurrence to the next one, it is not sufficient at all times when the dependencies tracked are memory-carried. For instance, in the second example from \autoref{lst:loop_carried_exn}, an instruction depends on another two iterations ago. Dependencies can reach arbitrarily old iterations of a loop: in this example, \lstxasm{-8192(\%rbx, \%r10)} may be used to reach 1\,024 iterations back. However, while far-reaching dependencies may \emph{exist}, they are not necessarily \emph{relevant} from a performance analysis point of view. Indeed, if an instruction $i_2$ depends on a result previously produced by an instruction $i_1$, this dependency is only relevant if it is possible that $i_1$ is not yet completed when $i_2$ is considered for issuing ---~else, the result is already produced, and $i_2$ needs not wait to execute. The reorder buffer (ROB) of a CPU can be modelled as a sliding window of fixed size over \uops{}. In particular, if a \uop{} $\mu_1$ is not yet retired, the ROB may not contain \uops{} more than the ROB's size ahead of $\mu_1$. This is in particular also true for instructions, as the vast majority of instructions decode to at least one \uop{}\footnote{Some \texttt{mov} instructions from register to register may, for instance, only have an impact on the renamer; no \uops{} are dispatched to the backend.}. A possible solution to detect loop-carried dependencies in a kernel $\kerK$ is thus to unroll it until it contains about $\card{\text{ROB}} + \card{\kerK}$. This ensures that every instruction in the last kernel can find dependencies reaching up to $\card{\text{ROB}}$ back. On Intel CPUs, the reorder buffer size contained 224 \uops{} on Skylake (2015), or 512 \uops{} on Golden Cove (2021)~\cite{wikichip_intel_rob_size}. These sizes are small enough to reasonably use this solution without excessive slowdown.