52 lines
2.3 KiB
TeX
52 lines
2.3 KiB
TeX
\chapter{Introduction}\label{chap:intro}
|
|
|
|
Developing new features and fixing problems are often regarded as the major
|
|
parts of the development cycle of a program. However, performance optimization
|
|
might be just as crucial for compute-intensive software. On small-scale
|
|
applications, it improves usability by reducing, or even hiding, the waiting
|
|
time the user must endure between operations, or by allowing heavier workloads
|
|
to be processed without needing larger resources. On large-scale applications,
|
|
that may run for an extended period of time, or may be run on whole clusters,
|
|
optimization is a cost-effective path, as it allows the same workload to be run
|
|
on smaller clusters, for reduced periods of time.
|
|
|
|
The most significant optimisation gains come from ``high-level'' algorithmic
|
|
changes, such as computing on multiple cores instead of sequentially, caching
|
|
already computed results, reimplementing a function to run asymptotically in
|
|
$\bigO{n\cdot \log(n)}$ instead of $\bigO{n^2}$ or avoiding the copy of large
|
|
data structures. However, when a software is already well-optimized from these
|
|
perspectives, the impact of low-level considerations, stemming from the
|
|
hardware implementation of the machine itself, cannot be neglected anymore. A
|
|
common example of such impacts is the iteration of a large matrix either
|
|
row-major or column-major:
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{minipage}[c]{0.48\linewidth}
|
|
\begin{algorithmic}
|
|
\State{sum $\gets 0$}
|
|
\For{row $<$ MAX\_ROW}
|
|
\For{column $<$ MAX\_COLUMN}
|
|
\State{sum $\gets$ sum $+ \text{matrix}[\text{row}][\text{col}]$}
|
|
\EndFor
|
|
\EndFor
|
|
\end{algorithmic}
|
|
\end{minipage}\hfill
|
|
\begin{minipage}[c]{0.48\linewidth}
|
|
\begin{algorithmic}
|
|
\State{sum $\gets 0$}
|
|
\For{column $<$ MAX\_COLUMN}
|
|
\For{row $<$ MAX\_ROW}
|
|
\State{sum $\gets$ sum $+ \text{matrix}[\text{row}][\text{col}]$}
|
|
\EndFor
|
|
\EndFor
|
|
\end{algorithmic}
|
|
\end{minipage}
|
|
|
|
\vspace{1em}
|
|
|
|
While both programs are performing the exact same computation, the left one
|
|
iterates on rows first, or \textit{row-major}, while the right one iterates on
|
|
columns first, or \textit{column-major}. The latter, on large matrices, will
|
|
cause frequent cache misses, and was measured to run up to about six times
|
|
slower than the former~\cite{rowmajor_repo}.
|