phd-thesis/manuscrit/10_introduction/main.tex

\chapter{Introduction}\label{chap:intro}

Developing new features and fixing problems are often regarded as the major
parts of the development cycle of a program. However, performance optimization
might be just as crucial for compute-intensive software. On small-scale
applications, it improves usability by reducing, or even hiding, the waiting
time the user must endure between operations, or by allowing heavier workloads
to be processed without needing larger resources. On large-scale applications,
that may run for an extended period of time, or may be run on whole clusters,
optimization is a cost-effective path, as it allows the same workload to be run
on smaller clusters, for reduced periods of time.

The most significant optimisation gains come from ``high-level'' algorithmic
changes, such as computing on multiple cores instead of sequentially, caching
already computed results, reimplementing a function to run asymptotically in
$\bigO{n\cdot \log(n)}$ instead of $\bigO{n^2}$ or avoiding the copy of large
data structures. However, when a software is already well-optimized from these
perspectives, the impact of low-level considerations, stemming from the
hardware implementation of the machine itself, cannot be neglected anymore. A
common example of such impacts is the iteration of a large matrix either
row-major or column-major:

\vspace{1em}

\begin{minipage}[c]{0.48\linewidth}
    \begin{algorithmic}
        \State{sum $\gets 0$}
        \For{row $<$ MAX\_ROW}
            \For{column $<$ MAX\_COLUMN}
                \State{sum $\gets$ sum $+ \text{matrix}[\text{row}][\text{col}]$}
            \EndFor
        \EndFor
    \end{algorithmic}
\end{minipage}\hfill
\begin{minipage}[c]{0.48\linewidth}
    \begin{algorithmic}
        \State{sum $\gets 0$}
        \For{column $<$ MAX\_COLUMN}
            \For{row $<$ MAX\_ROW}
                \State{sum $\gets$ sum $+ \text{matrix}[\text{row}][\text{col}]$}
            \EndFor
        \EndFor
    \end{algorithmic}
\end{minipage}

\vspace{1em}

While both programs are performing the exact same computation, the left one
iterates on rows first, or \textit{row-major}, while the right one iterates on
columns first, or \textit{column-major}. The latter, on large matrices, will
cause frequent cache misses, and was measured to run up to about six times
slower than the former~\cite{rowmajor_repo}.
No results found.