diff --git a/report/report.tex b/report/report.tex index 2bb4d86..d0fa818 100644 --- a/report/report.tex +++ b/report/report.tex @@ -53,8 +53,8 @@ transformations to a circuit. This problem turns out to be more or less the \emph{subgraph isomorphism - problem}, which is NP-complete, and must nevertheless be solved fast on - processor-sized circuits on this particular case. + problem}, which is NP-complete, and must nevertheless be solved efficiently + on processor-sized circuits on this particular case. During my internship, I developed a C++ library to perform this task that will be integrated in VossII, based on a few well-known algorithms as well @@ -211,7 +211,7 @@ available on GitHub: \url{https://github.com/tobast/circuit-isomatch/} \end{center} -\subsection{Objective} +\subsection{Problems} More precisely, the problems that \emph{isomatch} must solve are the following. @@ -247,6 +247,39 @@ to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms is nevertheless necessary to implement them the right way, and with the right heuristics, to get the desired efficiency for the given problem. +\subsection{Code quality} + +Another prominent objective was to keep the codebase as clean as possible. +Indeed, this code will probably have to be maintained for quite some time, and +most probably by other people than me. This means that the code and all its +surroundings must be really clean, readable and reusable. I tried to put a lot +of effort in making the code idiomatic and easy to use, through \eg{} the +implementation of iterators over my data structures when needed, idiomatic +C++14, etc. + +This also means that the code has to be well-documented: the git history had to +be kept clean and understandable, and a clean documentation can be generated +from the code, using \texttt{doxygen}. The latest documentation is also +compiled as HTML pages here: + +\begin{center} + \raisebox{-0.4\height}{ + \includegraphics[height=2.3em]{../common/docs.png}} + \hspace{1em} + \url{https://tobast.fr/m1/isomatch} +\end{center} + +Since the code is C++, it is also very prone to diverse bugs. While I did not +took the time to integrate unit tests --- which would have been a great +addition ---, I used a sequence of test that can be run using \lstc{make +test}, and tests a lot of features of isomatch. + +The code is also tested regularly and on a wide variety of cases with +\lstbash{valgrind} to ensure that there are no memory errors --- +use-after-free, unallocated memory, memory leaks, bad pointer +arithmetics,~\ldots In every tested case, strictly no memory is lost, and no +invalid read was reported. + \subsection{Sought efficiency} The goal of \textit{isomatch} is to be applied to large circuits on-the-fly, @@ -257,10 +290,6 @@ matching operations will be executed quite often, and often multiple times in a row. It must then remain fast enough for the human not to lose too much time, and eventually lose patience. -\todo{Mention clean codebase somewhere} - -\todo{Mention VossII somewhere} - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{General approach} @@ -268,7 +297,7 @@ The global strategy used to solve efficiently the problem can be broken down to three main parts. \paragraph{Signatures.} The initial idea to make the computation fast is to -aggregate the inner data of a gate --- be it a leaf gate or a group --- in a +aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a kind of hash, a 64 bits unsigned integer. This approach is directly inspired from what was done in fl, back at Intel. This hash must be easy to compute, and must be based only on the structure of the graph --- that is, must be @@ -307,8 +336,10 @@ this problem, that uses the specificities of the graph to be a little faster. The signature is computed as a simple hash of the element, and is defined for every type of expression and circuit. It could probably be enhanced with a bit -more work to cover more uniformly the hash space, but no collision was observed -on the examples tested. +more work to cover more uniformly the hash space, but no illegitimate collision +(that is, a collision that could be avoided with a better hash function, as +opposed to collisions due to an equal local graph structure) was observed on +the examples tested. \paragraph{Signature constants.} Signature constants are used all around the signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32 @@ -316,22 +347,24 @@ bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned integer); while $a$ is a random integer uniformly picked between $2^{16}$ and $2^{32}$. These constants are generated by a small python script, -\path{util/primegen/pickPrimes.py}. +\path{util/primegen/pickPrimes.py} in the repository. Those constants are used to produce a 64 bits unsigned value out of another 64 bits unsigned value, called $v$ thereafter, through an operator $\sigop$, computed as follows (with all computations done on 64 bits unsigned integers). \vspace{1em} -\begin{algorithmic} - \Function{$\sigop$}{$\sigconst{}, v$} - \State{} $out1 \gets (v + a) \cdot x_l$ - \State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$ - \State{} $low \gets out1 \,\%\, d_l$ - \State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$ - \State{} \Return{} $low + 2^{32} \cdot high$ - \EndFunction{} -\end{algorithmic} +\begin{center} + \begin{algorithmic} + \Function{$\sigop$}{$\sigconst{}, v$} + \State{} $out1 \gets (v + a) \cdot x_l$ + \State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$ + \State{} $low \gets out1 \,\%\, d_l$ + \State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$ + \State{} \Return{} $low + 2^{32} \cdot high$ + \EndFunction{} + \end{algorithmic} +\end{center} \paragraph{Expressions.} Each type of expression (or, in the case of expression with operator, each type of operator) has its signature constant, @@ -358,7 +391,7 @@ capture at all the \emph{structure} of the graph. An information we can capture without breaking the signature's independence towards the order of description of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates without breaking this rule; thus, we represent the set of neighbours by the set -of our \emph{neighbours' signatures}. +of the \emph{neighbours' signatures}. At this point, we can define the \emph{signature of order $n$} ($n \in \natset$) of a circuit $C$ as follows: @@ -375,13 +408,13 @@ At this point, we can define the \emph{signature of order $n$} ($n \in The ``IO adjacency'' term is an additional term in the signatures of order above $0$, indicating what input and output pins of the circuit group -containing the current gate are adjacent to it. +containing the current gate are adjacent to it. Adding this information to the +signature was necessary, since a lot of gates can be signed differently using +this information (see Corner cases in Section~\ref{ssec:corner_cases}). The default order of signature used in all computations, unless more is useful, is 2, after a few benchmarks. -\todo{explain range of $n$} - \paragraph{Efficiency.} Every circuit memoizes all it can concerning its signature: the inner signature, the IO adjacency, the signatures of order $n$ already computed, etc. @@ -400,8 +433,10 @@ or its children are modified. A memoized data is always stored alongside with a timestamp of computation, which invalidates a previous result when needed. One possible path of investigation for future work, if the computation turns -out to be still too slow in real-world cases --- which looks unlikely ---, -would be to try to multithread this computation. +out to be still too slow in real-world cases --- which looks unlikely, unless +fl's substitution is run on a regular basis for a huge number of cases using +\eg{} a crontab for automated testing ---, would be to try to multithread this +computation. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Group equality}\label{sec:group_equality} @@ -428,7 +463,7 @@ the number of permutations examined to no more than $4$ in studied cases. Once a permutation is judged worth to be examined, the group equality is run recursively on all its matched gates. If this step succeeds, the graph -structure is then checked. If both steps succeeds, the permutation is correct +structure is then checked. If both steps succeed, the permutation is correct and an isomorphism has been found; if not, we move on to the next permutation. \todo{Anything more to tell here?} @@ -449,7 +484,10 @@ was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}. Another, more recent algorithm to deal with this problem is Luigi P Cordella's VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is mostly Ullmann's algorithm, transcribed in a recursive writing, with the -addition of five heuristics. \qtodo{Why not use it then?} +addition of five heuristics. I originally planned to implement both algorithms +and benchmark both, but had no time to do so in the end; though, Ullmann with +the few additional heuristics applicable in our very specific case turned out +to be fast enough. Ullmann is a widely used and fast algorithm for this problem. It makes an extensive use of adjacency matrix description of the graph, and the initial @@ -461,8 +499,9 @@ matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a possible match with the $j$-th haystack part. This matrix is called $perm$ thereafter. -The algorithm, left apart the \textsc{refine} function (detailed just after), -is described in Figure~\ref{alg:ullmann}. +The algorithm, left apart the \textsc{refine} function, which is detailed just +after and can be omitted for a (way) slower version of the algorithm, is +described in Figure~\ref{alg:ullmann}. \begin{figure}[h] \begin{algorithmic} @@ -505,7 +544,7 @@ is described in Figure~\ref{alg:ullmann}. The refining process is the actual keystone of the algorithm. It is the mechanism allowing the algorithm to cut down many exploration branches, by -removing ones from the matrix. +changing ones to zeroes in the matrix being built. The idea is that a match between a needle's vertex $i$ and a haystack's vertex $j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour @@ -583,7 +622,12 @@ occur). \subsection{Implementation optimisations} \paragraph{Initial permutation matrix.} The matrix is first filled according to -the signatures matches. It is then refined a bit more, by making sure that for +the signatures matches. Note that only signatures of order 0 --- \ie{} the +inner data of a vertex --- can be used here: indeed, we cannot rely on the +context here, since there can be some context in the haystack that is absent +from the needle, and we cannot check for ``context inclusion'' with our +definition of signatures: \emph{all} the context must be exactly the same for +two signatures to match. It is then refined a bit more, by making sure that for every match, every potentially matching gate has the same ``wire kinds''. Indeed, a gate needle's wire must have at least the same inbound adjacent signatures as its matching haystack wire, and same goes for outbound adjacent @@ -665,6 +709,13 @@ for a single run) and measured by the command \texttt{time}. \end{tikzpicture} \end{center} +The computation time is more or less linear in in the level of signature +required, which is coherent with the implementation. In practice, only small +portions of a circuit will be signed with a high order, which means that we can +afford really high order signatures (\eg{} 40 or 50, which already means that +the diameter of the group is 40 or 50) without having a real impact on the +computation time. + \paragraph{Equality.} To test the circuit group equality, a small piece of code takes a circuit, scrambles it as much as possible @@ -680,13 +731,15 @@ considerably speeding it up: the same program proving only one way takes about -\subsection{Corner cases} +\subsection{Corner cases}\label{ssec:corner_cases} There were a few observed cases where the algorithm tends to be slower on -certain configurations. +certain configurations, and a few other such cases that could be fixed. \todo{More corner cases} +\todo{Corner case: io pins, io adjacency} + \paragraph{Split/merge trees.} A common pattern that tends to slow down the algorithm is split/merge trees. Those patterns occur when one wants to merge $n$ one bit wires into a single $n$ bits wire, or the other way around. @@ -712,6 +765,12 @@ nodes on the layer below cannot be freely exchanged. \todo{Figure describing the problem} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section*{Conclusion} + +\todo{} + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \printbibliography{}