Add code quality subsection
This commit is contained in:
parent
5cafe06d2f
commit
4f5b0e8784
1 changed files with 93 additions and 34 deletions
|
@ -53,8 +53,8 @@
|
|||
transformations to a circuit.
|
||||
|
||||
This problem turns out to be more or less the \emph{subgraph isomorphism
|
||||
problem}, which is NP-complete, and must nevertheless be solved fast on
|
||||
processor-sized circuits on this particular case.
|
||||
problem}, which is NP-complete, and must nevertheless be solved efficiently
|
||||
on processor-sized circuits on this particular case.
|
||||
|
||||
During my internship, I developed a C++ library to perform this task that
|
||||
will be integrated in VossII, based on a few well-known algorithms as well
|
||||
|
@ -211,7 +211,7 @@ available on GitHub:
|
|||
\url{https://github.com/tobast/circuit-isomatch/}
|
||||
\end{center}
|
||||
|
||||
\subsection{Objective}
|
||||
\subsection{Problems}
|
||||
|
||||
More precisely, the problems that \emph{isomatch} must solve are the following.
|
||||
|
||||
|
@ -247,6 +247,39 @@ to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
|
|||
is nevertheless necessary to implement them the right way, and with the right
|
||||
heuristics, to get the desired efficiency for the given problem.
|
||||
|
||||
\subsection{Code quality}
|
||||
|
||||
Another prominent objective was to keep the codebase as clean as possible.
|
||||
Indeed, this code will probably have to be maintained for quite some time, and
|
||||
most probably by other people than me. This means that the code and all its
|
||||
surroundings must be really clean, readable and reusable. I tried to put a lot
|
||||
of effort in making the code idiomatic and easy to use, through \eg{} the
|
||||
implementation of iterators over my data structures when needed, idiomatic
|
||||
C++14, etc.
|
||||
|
||||
This also means that the code has to be well-documented: the git history had to
|
||||
be kept clean and understandable, and a clean documentation can be generated
|
||||
from the code, using \texttt{doxygen}. The latest documentation is also
|
||||
compiled as HTML pages here:
|
||||
|
||||
\begin{center}
|
||||
\raisebox{-0.4\height}{
|
||||
\includegraphics[height=2.3em]{../common/docs.png}}
|
||||
\hspace{1em}
|
||||
\url{https://tobast.fr/m1/isomatch}
|
||||
\end{center}
|
||||
|
||||
Since the code is C++, it is also very prone to diverse bugs. While I did not
|
||||
took the time to integrate unit tests --- which would have been a great
|
||||
addition ---, I used a sequence of test that can be run using \lstc{make
|
||||
test}, and tests a lot of features of isomatch.
|
||||
|
||||
The code is also tested regularly and on a wide variety of cases with
|
||||
\lstbash{valgrind} to ensure that there are no memory errors ---
|
||||
use-after-free, unallocated memory, memory leaks, bad pointer
|
||||
arithmetics,~\ldots In every tested case, strictly no memory is lost, and no
|
||||
invalid read was reported.
|
||||
|
||||
\subsection{Sought efficiency}
|
||||
|
||||
The goal of \textit{isomatch} is to be applied to large circuits on-the-fly,
|
||||
|
@ -257,10 +290,6 @@ matching operations will be executed quite often, and often multiple times in a
|
|||
row. It must then remain fast enough for the human not to lose too much time,
|
||||
and eventually lose patience.
|
||||
|
||||
\todo{Mention clean codebase somewhere}
|
||||
|
||||
\todo{Mention VossII somewhere}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{General approach}
|
||||
|
||||
|
@ -268,7 +297,7 @@ The global strategy used to solve efficiently the problem can be broken down to
|
|||
three main parts.
|
||||
|
||||
\paragraph{Signatures.} The initial idea to make the computation fast is to
|
||||
aggregate the inner data of a gate --- be it a leaf gate or a group --- in a
|
||||
aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a
|
||||
kind of hash, a 64 bits unsigned integer. This approach is directly inspired
|
||||
from what was done in fl, back at Intel. This hash must be easy to compute,
|
||||
and must be based only on the structure of the graph --- that is, must be
|
||||
|
@ -307,8 +336,10 @@ this problem, that uses the specificities of the graph to be a little faster.
|
|||
|
||||
The signature is computed as a simple hash of the element, and is defined for
|
||||
every type of expression and circuit. It could probably be enhanced with a bit
|
||||
more work to cover more uniformly the hash space, but no collision was observed
|
||||
on the examples tested.
|
||||
more work to cover more uniformly the hash space, but no illegitimate collision
|
||||
(that is, a collision that could be avoided with a better hash function, as
|
||||
opposed to collisions due to an equal local graph structure) was observed on
|
||||
the examples tested.
|
||||
|
||||
\paragraph{Signature constants.} Signature constants are used all around the
|
||||
signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32
|
||||
|
@ -316,13 +347,14 @@ bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime
|
|||
numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned
|
||||
integer); while $a$ is a random integer uniformly picked between $2^{16}$ and
|
||||
$2^{32}$. These constants are generated by a small python script,
|
||||
\path{util/primegen/pickPrimes.py}.
|
||||
\path{util/primegen/pickPrimes.py} in the repository.
|
||||
|
||||
Those constants are used to produce a 64 bits unsigned value out of another 64
|
||||
bits unsigned value, called $v$ thereafter, through an operator $\sigop$,
|
||||
computed as follows (with all computations done on 64 bits unsigned integers).
|
||||
|
||||
\vspace{1em}
|
||||
\begin{center}
|
||||
\begin{algorithmic}
|
||||
\Function{$\sigop$}{$\sigconst{}, v$}
|
||||
\State{} $out1 \gets (v + a) \cdot x_l$
|
||||
|
@ -332,6 +364,7 @@ computed as follows (with all computations done on 64 bits unsigned integers).
|
|||
\State{} \Return{} $low + 2^{32} \cdot high$
|
||||
\EndFunction{}
|
||||
\end{algorithmic}
|
||||
\end{center}
|
||||
|
||||
\paragraph{Expressions.} Each type of expression (or, in the case of
|
||||
expression with operator, each type of operator) has its signature constant,
|
||||
|
@ -358,7 +391,7 @@ capture at all the \emph{structure} of the graph. An information we can capture
|
|||
without breaking the signature's independence towards the order of description
|
||||
of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates
|
||||
without breaking this rule; thus, we represent the set of neighbours by the set
|
||||
of our \emph{neighbours' signatures}.
|
||||
of the \emph{neighbours' signatures}.
|
||||
|
||||
At this point, we can define the \emph{signature of order $n$} ($n \in
|
||||
\natset$) of a circuit $C$ as follows:
|
||||
|
@ -375,13 +408,13 @@ At this point, we can define the \emph{signature of order $n$} ($n \in
|
|||
|
||||
The ``IO adjacency'' term is an additional term in the signatures of order
|
||||
above $0$, indicating what input and output pins of the circuit group
|
||||
containing the current gate are adjacent to it.
|
||||
containing the current gate are adjacent to it. Adding this information to the
|
||||
signature was necessary, since a lot of gates can be signed differently using
|
||||
this information (see Corner cases in Section~\ref{ssec:corner_cases}).
|
||||
|
||||
The default order of signature used in all computations, unless more is useful,
|
||||
is 2, after a few benchmarks.
|
||||
|
||||
\todo{explain range of $n$}
|
||||
|
||||
\paragraph{Efficiency.} Every circuit memoizes all it can concerning its
|
||||
signature: the inner signature, the IO adjacency, the signatures of order $n$
|
||||
already computed, etc.
|
||||
|
@ -400,8 +433,10 @@ or its children are modified. A memoized data is always stored alongside with a
|
|||
timestamp of computation, which invalidates a previous result when needed.
|
||||
|
||||
One possible path of investigation for future work, if the computation turns
|
||||
out to be still too slow in real-world cases --- which looks unlikely ---,
|
||||
would be to try to multithread this computation.
|
||||
out to be still too slow in real-world cases --- which looks unlikely, unless
|
||||
fl's substitution is run on a regular basis for a huge number of cases using
|
||||
\eg{} a crontab for automated testing ---, would be to try to multithread this
|
||||
computation.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Group equality}\label{sec:group_equality}
|
||||
|
@ -428,7 +463,7 @@ the number of permutations examined to no more than $4$ in studied cases.
|
|||
|
||||
Once a permutation is judged worth to be examined, the group equality is run
|
||||
recursively on all its matched gates. If this step succeeds, the graph
|
||||
structure is then checked. If both steps succeeds, the permutation is correct
|
||||
structure is then checked. If both steps succeed, the permutation is correct
|
||||
and an isomorphism has been found; if not, we move on to the next permutation.
|
||||
|
||||
\todo{Anything more to tell here?}
|
||||
|
@ -449,7 +484,10 @@ was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}.
|
|||
Another, more recent algorithm to deal with this problem is Luigi P Cordella's
|
||||
VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is
|
||||
mostly Ullmann's algorithm, transcribed in a recursive writing, with the
|
||||
addition of five heuristics. \qtodo{Why not use it then?}
|
||||
addition of five heuristics. I originally planned to implement both algorithms
|
||||
and benchmark both, but had no time to do so in the end; though, Ullmann with
|
||||
the few additional heuristics applicable in our very specific case turned out
|
||||
to be fast enough.
|
||||
|
||||
Ullmann is a widely used and fast algorithm for this problem. It makes an
|
||||
extensive use of adjacency matrix description of the graph, and the initial
|
||||
|
@ -461,8 +499,9 @@ matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a
|
|||
possible match with the $j$-th haystack part. This matrix is called $perm$
|
||||
thereafter.
|
||||
|
||||
The algorithm, left apart the \textsc{refine} function (detailed just after),
|
||||
is described in Figure~\ref{alg:ullmann}.
|
||||
The algorithm, left apart the \textsc{refine} function, which is detailed just
|
||||
after and can be omitted for a (way) slower version of the algorithm, is
|
||||
described in Figure~\ref{alg:ullmann}.
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{algorithmic}
|
||||
|
@ -505,7 +544,7 @@ is described in Figure~\ref{alg:ullmann}.
|
|||
|
||||
The refining process is the actual keystone of the algorithm. It is the
|
||||
mechanism allowing the algorithm to cut down many exploration branches, by
|
||||
removing ones from the matrix.
|
||||
changing ones to zeroes in the matrix being built.
|
||||
|
||||
The idea is that a match between a needle's vertex $i$ and a haystack's vertex
|
||||
$j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour
|
||||
|
@ -583,7 +622,12 @@ occur).
|
|||
\subsection{Implementation optimisations}
|
||||
|
||||
\paragraph{Initial permutation matrix.} The matrix is first filled according to
|
||||
the signatures matches. It is then refined a bit more, by making sure that for
|
||||
the signatures matches. Note that only signatures of order 0 --- \ie{} the
|
||||
inner data of a vertex --- can be used here: indeed, we cannot rely on the
|
||||
context here, since there can be some context in the haystack that is absent
|
||||
from the needle, and we cannot check for ``context inclusion'' with our
|
||||
definition of signatures: \emph{all} the context must be exactly the same for
|
||||
two signatures to match. It is then refined a bit more, by making sure that for
|
||||
every match, every potentially matching gate has the same ``wire kinds''.
|
||||
Indeed, a gate needle's wire must have at least the same inbound adjacent
|
||||
signatures as its matching haystack wire, and same goes for outbound adjacent
|
||||
|
@ -665,6 +709,13 @@ for a single run) and measured by the command \texttt{time}.
|
|||
\end{tikzpicture}
|
||||
\end{center}
|
||||
|
||||
The computation time is more or less linear in in the level of signature
|
||||
required, which is coherent with the implementation. In practice, only small
|
||||
portions of a circuit will be signed with a high order, which means that we can
|
||||
afford really high order signatures (\eg{} 40 or 50, which already means that
|
||||
the diameter of the group is 40 or 50) without having a real impact on the
|
||||
computation time.
|
||||
|
||||
|
||||
\paragraph{Equality.} To test the circuit group equality, a small piece of
|
||||
code takes a circuit, scrambles it as much as possible
|
||||
|
@ -680,13 +731,15 @@ considerably speeding it up: the same program proving only one way takes about
|
|||
|
||||
|
||||
|
||||
\subsection{Corner cases}
|
||||
\subsection{Corner cases}\label{ssec:corner_cases}
|
||||
|
||||
There were a few observed cases where the algorithm tends to be slower on
|
||||
certain configurations.
|
||||
certain configurations, and a few other such cases that could be fixed.
|
||||
|
||||
\todo{More corner cases}
|
||||
|
||||
\todo{Corner case: io pins, io adjacency}
|
||||
|
||||
\paragraph{Split/merge trees.} A common pattern that tends to slow down the
|
||||
algorithm is split/merge trees. Those patterns occur when one wants to merge
|
||||
$n$ one bit wires into a single $n$ bits wire, or the other way around.
|
||||
|
@ -712,6 +765,12 @@ nodes on the layer below cannot be freely exchanged.
|
|||
|
||||
\todo{Figure describing the problem}
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section*{Conclusion}
|
||||
|
||||
\todo{}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
\printbibliography{}
|
||||
|
|
Loading…
Reference in a new issue