Add code quality subsection

This commit is contained in:
Théophile Bastian 2017-08-24 17:07:02 +02:00
parent 5cafe06d2f
commit 4f5b0e8784

View file

@ -53,8 +53,8 @@
transformations to a circuit. transformations to a circuit.
This problem turns out to be more or less the \emph{subgraph isomorphism This problem turns out to be more or less the \emph{subgraph isomorphism
problem}, which is NP-complete, and must nevertheless be solved fast on problem}, which is NP-complete, and must nevertheless be solved efficiently
processor-sized circuits on this particular case. on processor-sized circuits on this particular case.
During my internship, I developed a C++ library to perform this task that During my internship, I developed a C++ library to perform this task that
will be integrated in VossII, based on a few well-known algorithms as well will be integrated in VossII, based on a few well-known algorithms as well
@ -211,7 +211,7 @@ available on GitHub:
\url{https://github.com/tobast/circuit-isomatch/} \url{https://github.com/tobast/circuit-isomatch/}
\end{center} \end{center}
\subsection{Objective} \subsection{Problems}
More precisely, the problems that \emph{isomatch} must solve are the following. More precisely, the problems that \emph{isomatch} must solve are the following.
@ -247,6 +247,39 @@ to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
is nevertheless necessary to implement them the right way, and with the right is nevertheless necessary to implement them the right way, and with the right
heuristics, to get the desired efficiency for the given problem. heuristics, to get the desired efficiency for the given problem.
\subsection{Code quality}
Another prominent objective was to keep the codebase as clean as possible.
Indeed, this code will probably have to be maintained for quite some time, and
most probably by other people than me. This means that the code and all its
surroundings must be really clean, readable and reusable. I tried to put a lot
of effort in making the code idiomatic and easy to use, through \eg{} the
implementation of iterators over my data structures when needed, idiomatic
C++14, etc.
This also means that the code has to be well-documented: the git history had to
be kept clean and understandable, and a clean documentation can be generated
from the code, using \texttt{doxygen}. The latest documentation is also
compiled as HTML pages here:
\begin{center}
\raisebox{-0.4\height}{
\includegraphics[height=2.3em]{../common/docs.png}}
\hspace{1em}
\url{https://tobast.fr/m1/isomatch}
\end{center}
Since the code is C++, it is also very prone to diverse bugs. While I did not
took the time to integrate unit tests --- which would have been a great
addition ---, I used a sequence of test that can be run using \lstc{make
test}, and tests a lot of features of isomatch.
The code is also tested regularly and on a wide variety of cases with
\lstbash{valgrind} to ensure that there are no memory errors ---
use-after-free, unallocated memory, memory leaks, bad pointer
arithmetics,~\ldots In every tested case, strictly no memory is lost, and no
invalid read was reported.
\subsection{Sought efficiency} \subsection{Sought efficiency}
The goal of \textit{isomatch} is to be applied to large circuits on-the-fly, The goal of \textit{isomatch} is to be applied to large circuits on-the-fly,
@ -257,10 +290,6 @@ matching operations will be executed quite often, and often multiple times in a
row. It must then remain fast enough for the human not to lose too much time, row. It must then remain fast enough for the human not to lose too much time,
and eventually lose patience. and eventually lose patience.
\todo{Mention clean codebase somewhere}
\todo{Mention VossII somewhere}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{General approach} \section{General approach}
@ -268,7 +297,7 @@ The global strategy used to solve efficiently the problem can be broken down to
three main parts. three main parts.
\paragraph{Signatures.} The initial idea to make the computation fast is to \paragraph{Signatures.} The initial idea to make the computation fast is to
aggregate the inner data of a gate --- be it a leaf gate or a group --- in a aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a
kind of hash, a 64 bits unsigned integer. This approach is directly inspired kind of hash, a 64 bits unsigned integer. This approach is directly inspired
from what was done in fl, back at Intel. This hash must be easy to compute, from what was done in fl, back at Intel. This hash must be easy to compute,
and must be based only on the structure of the graph --- that is, must be and must be based only on the structure of the graph --- that is, must be
@ -307,8 +336,10 @@ this problem, that uses the specificities of the graph to be a little faster.
The signature is computed as a simple hash of the element, and is defined for The signature is computed as a simple hash of the element, and is defined for
every type of expression and circuit. It could probably be enhanced with a bit every type of expression and circuit. It could probably be enhanced with a bit
more work to cover more uniformly the hash space, but no collision was observed more work to cover more uniformly the hash space, but no illegitimate collision
on the examples tested. (that is, a collision that could be avoided with a better hash function, as
opposed to collisions due to an equal local graph structure) was observed on
the examples tested.
\paragraph{Signature constants.} Signature constants are used all around the \paragraph{Signature constants.} Signature constants are used all around the
signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32 signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32
@ -316,22 +347,24 @@ bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime
numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned
integer); while $a$ is a random integer uniformly picked between $2^{16}$ and integer); while $a$ is a random integer uniformly picked between $2^{16}$ and
$2^{32}$. These constants are generated by a small python script, $2^{32}$. These constants are generated by a small python script,
\path{util/primegen/pickPrimes.py}. \path{util/primegen/pickPrimes.py} in the repository.
Those constants are used to produce a 64 bits unsigned value out of another 64 Those constants are used to produce a 64 bits unsigned value out of another 64
bits unsigned value, called $v$ thereafter, through an operator $\sigop$, bits unsigned value, called $v$ thereafter, through an operator $\sigop$,
computed as follows (with all computations done on 64 bits unsigned integers). computed as follows (with all computations done on 64 bits unsigned integers).
\vspace{1em} \vspace{1em}
\begin{algorithmic} \begin{center}
\Function{$\sigop$}{$\sigconst{}, v$} \begin{algorithmic}
\State{} $out1 \gets (v + a) \cdot x_l$ \Function{$\sigop$}{$\sigconst{}, v$}
\State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$ \State{} $out1 \gets (v + a) \cdot x_l$
\State{} $low \gets out1 \,\%\, d_l$ \State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$
\State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$ \State{} $low \gets out1 \,\%\, d_l$
\State{} \Return{} $low + 2^{32} \cdot high$ \State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$
\EndFunction{} \State{} \Return{} $low + 2^{32} \cdot high$
\end{algorithmic} \EndFunction{}
\end{algorithmic}
\end{center}
\paragraph{Expressions.} Each type of expression (or, in the case of \paragraph{Expressions.} Each type of expression (or, in the case of
expression with operator, each type of operator) has its signature constant, expression with operator, each type of operator) has its signature constant,
@ -358,7 +391,7 @@ capture at all the \emph{structure} of the graph. An information we can capture
without breaking the signature's independence towards the order of description without breaking the signature's independence towards the order of description
of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates
without breaking this rule; thus, we represent the set of neighbours by the set without breaking this rule; thus, we represent the set of neighbours by the set
of our \emph{neighbours' signatures}. of the \emph{neighbours' signatures}.
At this point, we can define the \emph{signature of order $n$} ($n \in At this point, we can define the \emph{signature of order $n$} ($n \in
\natset$) of a circuit $C$ as follows: \natset$) of a circuit $C$ as follows:
@ -375,13 +408,13 @@ At this point, we can define the \emph{signature of order $n$} ($n \in
The ``IO adjacency'' term is an additional term in the signatures of order The ``IO adjacency'' term is an additional term in the signatures of order
above $0$, indicating what input and output pins of the circuit group above $0$, indicating what input and output pins of the circuit group
containing the current gate are adjacent to it. containing the current gate are adjacent to it. Adding this information to the
signature was necessary, since a lot of gates can be signed differently using
this information (see Corner cases in Section~\ref{ssec:corner_cases}).
The default order of signature used in all computations, unless more is useful, The default order of signature used in all computations, unless more is useful,
is 2, after a few benchmarks. is 2, after a few benchmarks.
\todo{explain range of $n$}
\paragraph{Efficiency.} Every circuit memoizes all it can concerning its \paragraph{Efficiency.} Every circuit memoizes all it can concerning its
signature: the inner signature, the IO adjacency, the signatures of order $n$ signature: the inner signature, the IO adjacency, the signatures of order $n$
already computed, etc. already computed, etc.
@ -400,8 +433,10 @@ or its children are modified. A memoized data is always stored alongside with a
timestamp of computation, which invalidates a previous result when needed. timestamp of computation, which invalidates a previous result when needed.
One possible path of investigation for future work, if the computation turns One possible path of investigation for future work, if the computation turns
out to be still too slow in real-world cases --- which looks unlikely ---, out to be still too slow in real-world cases --- which looks unlikely, unless
would be to try to multithread this computation. fl's substitution is run on a regular basis for a huge number of cases using
\eg{} a crontab for automated testing ---, would be to try to multithread this
computation.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Group equality}\label{sec:group_equality} \section{Group equality}\label{sec:group_equality}
@ -428,7 +463,7 @@ the number of permutations examined to no more than $4$ in studied cases.
Once a permutation is judged worth to be examined, the group equality is run Once a permutation is judged worth to be examined, the group equality is run
recursively on all its matched gates. If this step succeeds, the graph recursively on all its matched gates. If this step succeeds, the graph
structure is then checked. If both steps succeeds, the permutation is correct structure is then checked. If both steps succeed, the permutation is correct
and an isomorphism has been found; if not, we move on to the next permutation. and an isomorphism has been found; if not, we move on to the next permutation.
\todo{Anything more to tell here?} \todo{Anything more to tell here?}
@ -449,7 +484,10 @@ was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}.
Another, more recent algorithm to deal with this problem is Luigi P Cordella's Another, more recent algorithm to deal with this problem is Luigi P Cordella's
VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is
mostly Ullmann's algorithm, transcribed in a recursive writing, with the mostly Ullmann's algorithm, transcribed in a recursive writing, with the
addition of five heuristics. \qtodo{Why not use it then?} addition of five heuristics. I originally planned to implement both algorithms
and benchmark both, but had no time to do so in the end; though, Ullmann with
the few additional heuristics applicable in our very specific case turned out
to be fast enough.
Ullmann is a widely used and fast algorithm for this problem. It makes an Ullmann is a widely used and fast algorithm for this problem. It makes an
extensive use of adjacency matrix description of the graph, and the initial extensive use of adjacency matrix description of the graph, and the initial
@ -461,8 +499,9 @@ matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a
possible match with the $j$-th haystack part. This matrix is called $perm$ possible match with the $j$-th haystack part. This matrix is called $perm$
thereafter. thereafter.
The algorithm, left apart the \textsc{refine} function (detailed just after), The algorithm, left apart the \textsc{refine} function, which is detailed just
is described in Figure~\ref{alg:ullmann}. after and can be omitted for a (way) slower version of the algorithm, is
described in Figure~\ref{alg:ullmann}.
\begin{figure}[h] \begin{figure}[h]
\begin{algorithmic} \begin{algorithmic}
@ -505,7 +544,7 @@ is described in Figure~\ref{alg:ullmann}.
The refining process is the actual keystone of the algorithm. It is the The refining process is the actual keystone of the algorithm. It is the
mechanism allowing the algorithm to cut down many exploration branches, by mechanism allowing the algorithm to cut down many exploration branches, by
removing ones from the matrix. changing ones to zeroes in the matrix being built.
The idea is that a match between a needle's vertex $i$ and a haystack's vertex The idea is that a match between a needle's vertex $i$ and a haystack's vertex
$j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour $j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour
@ -583,7 +622,12 @@ occur).
\subsection{Implementation optimisations} \subsection{Implementation optimisations}
\paragraph{Initial permutation matrix.} The matrix is first filled according to \paragraph{Initial permutation matrix.} The matrix is first filled according to
the signatures matches. It is then refined a bit more, by making sure that for the signatures matches. Note that only signatures of order 0 --- \ie{} the
inner data of a vertex --- can be used here: indeed, we cannot rely on the
context here, since there can be some context in the haystack that is absent
from the needle, and we cannot check for ``context inclusion'' with our
definition of signatures: \emph{all} the context must be exactly the same for
two signatures to match. It is then refined a bit more, by making sure that for
every match, every potentially matching gate has the same ``wire kinds''. every match, every potentially matching gate has the same ``wire kinds''.
Indeed, a gate needle's wire must have at least the same inbound adjacent Indeed, a gate needle's wire must have at least the same inbound adjacent
signatures as its matching haystack wire, and same goes for outbound adjacent signatures as its matching haystack wire, and same goes for outbound adjacent
@ -665,6 +709,13 @@ for a single run) and measured by the command \texttt{time}.
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
The computation time is more or less linear in in the level of signature
required, which is coherent with the implementation. In practice, only small
portions of a circuit will be signed with a high order, which means that we can
afford really high order signatures (\eg{} 40 or 50, which already means that
the diameter of the group is 40 or 50) without having a real impact on the
computation time.
\paragraph{Equality.} To test the circuit group equality, a small piece of \paragraph{Equality.} To test the circuit group equality, a small piece of
code takes a circuit, scrambles it as much as possible code takes a circuit, scrambles it as much as possible
@ -680,13 +731,15 @@ considerably speeding it up: the same program proving only one way takes about
\subsection{Corner cases} \subsection{Corner cases}\label{ssec:corner_cases}
There were a few observed cases where the algorithm tends to be slower on There were a few observed cases where the algorithm tends to be slower on
certain configurations. certain configurations, and a few other such cases that could be fixed.
\todo{More corner cases} \todo{More corner cases}
\todo{Corner case: io pins, io adjacency}
\paragraph{Split/merge trees.} A common pattern that tends to slow down the \paragraph{Split/merge trees.} A common pattern that tends to slow down the
algorithm is split/merge trees. Those patterns occur when one wants to merge algorithm is split/merge trees. Those patterns occur when one wants to merge
$n$ one bit wires into a single $n$ bits wire, or the other way around. $n$ one bit wires into a single $n$ bits wire, or the other way around.
@ -712,6 +765,12 @@ nodes on the layer below cannot be freely exchanged.
\todo{Figure describing the problem} \todo{Figure describing the problem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Conclusion}
\todo{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\printbibliography{} \printbibliography{}