|
|
|
@ -9,6 +9,9 @@
|
|
|
|
|
\usepackage{enumerate}
|
|
|
|
|
\usepackage{caption}
|
|
|
|
|
\usepackage{algorithmicx}
|
|
|
|
|
%\usepgfplotslibrary{external}
|
|
|
|
|
%\tikzexternalize
|
|
|
|
|
|
|
|
|
|
\usepackage[backend=biber,style=trad-alpha]{biblatex}
|
|
|
|
|
\usepackage[left=2cm,right=2cm,top=2cm,bottom=2cm]{geometry}
|
|
|
|
|
|
|
|
|
@ -21,6 +24,8 @@
|
|
|
|
|
\usepackage{../common/internship}
|
|
|
|
|
\usepackage{../common/math}
|
|
|
|
|
|
|
|
|
|
\usepackage{pgfplots}
|
|
|
|
|
|
|
|
|
|
\bibliography{../common/refs}
|
|
|
|
|
|
|
|
|
|
\title{Pattern-matching and substitution in electronic circuits}
|
|
|
|
@ -48,8 +53,8 @@
|
|
|
|
|
transformations to a circuit.
|
|
|
|
|
|
|
|
|
|
This problem turns out to be more or less the \emph{subgraph isomorphism
|
|
|
|
|
problem}, which is NP-complete, and must nevertheless be solved fast on
|
|
|
|
|
processor-sized circuits on this particular case.
|
|
|
|
|
problem}, which is NP-complete, and must nevertheless be solved efficiently
|
|
|
|
|
on processor-sized circuits on this particular case.
|
|
|
|
|
|
|
|
|
|
During my internship, I developed a C++ library to perform this task that
|
|
|
|
|
will be integrated in VossII, based on a few well-known algorithms as well
|
|
|
|
@ -61,8 +66,6 @@
|
|
|
|
|
|
|
|
|
|
\tableofcontents
|
|
|
|
|
|
|
|
|
|
\todo{Talk of the repo, somewhere}
|
|
|
|
|
|
|
|
|
|
\pagebreak
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
@ -70,24 +73,26 @@
|
|
|
|
|
|
|
|
|
|
In the previous years, verification and proved software has gathered an
|
|
|
|
|
increasing interest in the computer science community, as people realised how
|
|
|
|
|
hard bugs are to track down. But hardware bugs are even more tedious to find
|
|
|
|
|
and fix, and can easily lead to disastrous consequences, as those cannot be
|
|
|
|
|
patched on existing hardware. For instance, the well-known Pentium
|
|
|
|
|
``\textsc{fdiv}'' bug~\cite{pratt1995fdiv} that affected a large number of
|
|
|
|
|
Pentium processors lead to wrong results for some floating point divisions.
|
|
|
|
|
Intel had to offer to replace all the defective CPUs, leading to an announced
|
|
|
|
|
loss of 475 million dollars~\cite{nicely_fdiv}. Even recently, the Skylake and
|
|
|
|
|
Kaby Lake hyperthreading bug had to be patched using microcode, loosing
|
|
|
|
|
performance and reliability.
|
|
|
|
|
hard bugs are to track down, and the little confidence they had in their own
|
|
|
|
|
code. But hardware bugs are even more tedious to find and fix, and can easily
|
|
|
|
|
lead to disastrous consequences, as those cannot be patched on existing
|
|
|
|
|
hardware. For instance, the well-known Pentium ``\textsc{fdiv}''
|
|
|
|
|
bug~\cite{pratt1995fdiv} that affected a large number of Pentium processors
|
|
|
|
|
lead to wrong results for some floating point divisions. Intel had to offer to
|
|
|
|
|
replace all the defective CPUs, leading to an announced loss of 475 million
|
|
|
|
|
dollars~\cite{nicely_fdiv}. Even recently, the Skylake and Kaby Lake
|
|
|
|
|
hyperthreading bug had to be patched using microcode, losing performance and
|
|
|
|
|
reliability.
|
|
|
|
|
|
|
|
|
|
To avoid such disasters, the industry nowadays uses a wide range of techniques
|
|
|
|
|
to catch bugs as early as possible --- which, hopefully, is before the product's
|
|
|
|
|
release date. Among those are \todo{list + cite}, but also proved hardware. On
|
|
|
|
|
circuits as complex as processors, usually, only sub-components are proved
|
|
|
|
|
correct in a specified context --- that should, but is not proved to, be
|
|
|
|
|
respected by the other parts of the circuit. Yet, this trade-off between proved
|
|
|
|
|
correctness and engineer's work time already gives a pretty good confidence in
|
|
|
|
|
the circuit.
|
|
|
|
|
to catch bugs as early as possible --- which, hopefully, is before the
|
|
|
|
|
product's release date. Among those are \todo{list + cite}, but also proved
|
|
|
|
|
hardware. On circuits as complex as processors, usually, only sub-components
|
|
|
|
|
are proved correct with respect to a given specification of its behaviour,
|
|
|
|
|
while the circuit is kept in a specified context, \ie{} a set of valid inputs
|
|
|
|
|
and outputs, etc. --- that should, but is not proved to, be respected by the
|
|
|
|
|
other parts of the circuit. Yet, this trade-off between proved correctness and
|
|
|
|
|
engineer's work time already gives a pretty good confidence in the circuit.
|
|
|
|
|
|
|
|
|
|
In this context, Carl Seger was one of the main developers of fl at Intel, a
|
|
|
|
|
functional ml-inspired programming language integrating many features useful to
|
|
|
|
@ -108,7 +113,7 @@ understand, and make it easier to understand to work more efficiently.
|
|
|
|
|
and Mary Sheeran, aiming to develop tools for proved design of FPGA circuits.
|
|
|
|
|
One of the keystones of this project is an open-sourced and publicly available
|
|
|
|
|
version of fl, used for the proving part, and is still at the moment under
|
|
|
|
|
development.
|
|
|
|
|
heavy development.
|
|
|
|
|
|
|
|
|
|
My part of the work resided on this ``search and replace'' tool. More
|
|
|
|
|
specifically, I focused on writing a C++ library, \emph{isomatch}, which is
|
|
|
|
@ -147,7 +152,7 @@ functions for this task.
|
|
|
|
|
\vert~&- & \textit{(sub)} \\
|
|
|
|
|
\vert~&\times & \textit{(times)} \\
|
|
|
|
|
\vert~&\div & \textit{(div)} \\
|
|
|
|
|
\vert~&\% & \textit{(mod)} \\
|
|
|
|
|
\vert~&\% & \textit{(mod)} \\ % chktex 35
|
|
|
|
|
\vert~&\lsl & \textit{(logical shift left)} \\
|
|
|
|
|
\vert~&\lsr & \textit{(logical shift right)} \\
|
|
|
|
|
\vert~&\asr & \textit{(arithmetic shift right)} \\
|
|
|
|
@ -181,10 +186,32 @@ can be seen as the construction of a circuit by assembling smaller integrated
|
|
|
|
|
circuits (ICs), themselves built the same way, etc. A group is composed of
|
|
|
|
|
sub-circuits, input pins and output pins. Each level can of course contain
|
|
|
|
|
``leaf'' gates, like \textit{and} or \textit{delay} gates. This is important,
|
|
|
|
|
because it allows the program to work on smaller areas the circuit (\eg{}
|
|
|
|
|
because it allows the program to work on smaller areas of the circuit (\eg{}
|
|
|
|
|
loading in memory only a part of the circuit, etc.).
|
|
|
|
|
|
|
|
|
|
\subsection{Objective}
|
|
|
|
|
\emph{Isomatch} comes along with a small parser for a toy ad-hoc language,
|
|
|
|
|
designed to allow one to quickly write a test circuit and run the
|
|
|
|
|
\emph{isomatch} functions on it. There was no real apparent need for a better
|
|
|
|
|
language, or a standard circuit description language (such as VSDL), since the
|
|
|
|
|
user will mostly use \emph{isomatch} through fl, and feed it directly with data
|
|
|
|
|
read from fl --- which is able to handle \eg{} VSDL\@.
|
|
|
|
|
|
|
|
|
|
\subsection{Codebases}
|
|
|
|
|
|
|
|
|
|
Carl Seger's new version of fl is currently being developed as \textbf{VossII},
|
|
|
|
|
and is yet to be open-sourced.
|
|
|
|
|
|
|
|
|
|
My contribution to the project, \textbf{isomatch}, is free software, and is
|
|
|
|
|
available on GitHub:
|
|
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
|
\raisebox{-0.4\height}{
|
|
|
|
|
\includegraphics[height=2em]{../common/github32.png}}
|
|
|
|
|
\hspace{1em}
|
|
|
|
|
\url{https://github.com/tobast/circuit-isomatch/}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
|
|
\subsection{Problems}
|
|
|
|
|
|
|
|
|
|
More precisely, the problems that \emph{isomatch} must solve are the following.
|
|
|
|
|
|
|
|
|
@ -194,25 +221,25 @@ More precisely, the problems that \emph{isomatch} must solve are the following.
|
|
|
|
|
way, with possibly different names, etc.?
|
|
|
|
|
|
|
|
|
|
\item\label{prob:match} Given two circuits, \emph{needle} and
|
|
|
|
|
\emph{haystack}, find every (non-overlapping) occurrence of \emph{needle} in
|
|
|
|
|
\emph{haystack}. An occurrence is a set $S$ of sub-circuits of
|
|
|
|
|
\emph{haystack} such that there is a one-to-one mapping of structurally
|
|
|
|
|
equivalent circuits of $S$ with circuits of \emph{needle}, and those
|
|
|
|
|
circuits are connected the same way in both circuits.
|
|
|
|
|
\emph{haystack}, find every (non-overlapping) occurrence of
|
|
|
|
|
\emph{needle} in \emph{haystack}. An occurrence is a set $S$ of
|
|
|
|
|
sub-circuits of \emph{haystack} such that there is a one-to-one mapping
|
|
|
|
|
of structurally equivalent circuits of $S$ with circuits of
|
|
|
|
|
\emph{needle}, and those circuits are connected the same way in both circuits.
|
|
|
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
|
|
Both problems are hard. The first one is an instance of graph isomorphism, as
|
|
|
|
|
the actual question is whether there exists a one-to-one mapping between
|
|
|
|
|
sub-circuits of the two groups, such that every mapped circuit is equal to the
|
|
|
|
|
other (either directly if it is a leaf gate, or recursively with the same
|
|
|
|
|
procedure); and whether this mapping respects connections (edges) between those
|
|
|
|
|
circuits. Graph isomorphism is known to be in NP (given a permutation of the
|
|
|
|
|
first graph, it is polynomial to check whether the first is equal to the second
|
|
|
|
|
\wrt{} the permutation), but not known to be in either P or NP-complete. Thus,
|
|
|
|
|
since Babai's work on graph isomorphism~\cite{babai2016graph} is only of
|
|
|
|
|
theoretical interest, the known algorithms remain in worst-case exponential
|
|
|
|
|
time, and require ad-hoc heuristics for specific kind of graphs to get maximum
|
|
|
|
|
efficiency.
|
|
|
|
|
Both problems are hard. The first one is an instance of the graph isomorphism
|
|
|
|
|
problem, as the actual question is whether there exists a one-to-one mapping
|
|
|
|
|
between sub-circuits of the two groups, such that every mapped circuit is equal
|
|
|
|
|
to the other (either directly if it is a leaf gate, or recursively with the
|
|
|
|
|
same procedure); and whether this mapping respects connections (edges) between
|
|
|
|
|
those circuits. Graph isomorphism is known to be in NP (given a permutation of
|
|
|
|
|
the first graph, it is polynomial to check whether the first is equal to the
|
|
|
|
|
second \wrt{} the permutation), but not known to be in either P or NP-complete.
|
|
|
|
|
Thus, since Babai's work on graph isomorphism~\cite{babai2016graph} is only of
|
|
|
|
|
theoretical interest (at the moment), the known algorithms remain in worst-case
|
|
|
|
|
exponential time, and require ad-hoc heuristics for specific kind of graphs to
|
|
|
|
|
get maximum efficiency.
|
|
|
|
|
|
|
|
|
|
The second one is an instance of subgraph isomorphism problem, which is known
|
|
|
|
|
to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
|
|
|
|
@ -220,19 +247,48 @@ to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
|
|
|
|
|
is nevertheless necessary to implement them the right way, and with the right
|
|
|
|
|
heuristics, to get the desired efficiency for the given problem.
|
|
|
|
|
|
|
|
|
|
\subsection{Code quality}
|
|
|
|
|
|
|
|
|
|
Another prominent objective was to keep the codebase as clean as possible.
|
|
|
|
|
Indeed, this code will probably have to be maintained for quite some time, and
|
|
|
|
|
most probably by other people than me. This means that the code and all its
|
|
|
|
|
surroundings must be really clean, readable and reusable. I tried to put a lot
|
|
|
|
|
of effort in making the code idiomatic and easy to use, through \eg{} the
|
|
|
|
|
implementation of iterators over my data structures when needed, idiomatic
|
|
|
|
|
C++14, etc.
|
|
|
|
|
|
|
|
|
|
This also means that the code has to be well-documented: the git history had to
|
|
|
|
|
be kept clean and understandable, and a clean documentation can be generated
|
|
|
|
|
from the code, using \texttt{doxygen}. The latest documentation is also
|
|
|
|
|
compiled as HTML pages here:
|
|
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
|
\raisebox{-0.4\height}{
|
|
|
|
|
\includegraphics[height=2.3em]{../common/docs.png}}
|
|
|
|
|
\hspace{1em}
|
|
|
|
|
\url{https://tobast.fr/m1/isomatch}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
|
|
Since the code is C++, it is also very prone to diverse bugs. While I did not
|
|
|
|
|
took the time to integrate unit tests --- which would have been a great
|
|
|
|
|
addition ---, I used a sequence of test that can be run using \lstc{make
|
|
|
|
|
test}, and tests a lot of features of isomatch.
|
|
|
|
|
|
|
|
|
|
The code is also tested regularly and on a wide variety of cases with
|
|
|
|
|
\lstbash{valgrind} to ensure that there are no memory errors ---
|
|
|
|
|
use-after-free, unallocated memory, memory leaks, bad pointer
|
|
|
|
|
arithmetics,~\ldots In every tested case, strictly no memory is lost, and no
|
|
|
|
|
invalid read was reported.
|
|
|
|
|
|
|
|
|
|
\subsection{Sought efficiency}
|
|
|
|
|
|
|
|
|
|
The goal of \textit{isomatch} is to be applied to large circuits on-the-fly,
|
|
|
|
|
during their conception. Those circuits can (and will probably) be as large as
|
|
|
|
|
a full processor, and the software will be operated by a human, working on
|
|
|
|
|
their circuit. Thus, \textit{isomatch} must be as fast as possible, since
|
|
|
|
|
matching operation will be executed often, and often multiple times in a row.
|
|
|
|
|
It must then remain fast enough for the human not to lose too much time, and
|
|
|
|
|
eventually lose patience.
|
|
|
|
|
|
|
|
|
|
\todo{Mention clean codebase somewhere}
|
|
|
|
|
|
|
|
|
|
\todo{Mention VossII somewhere}
|
|
|
|
|
matching operations will be executed quite often, and often multiple times in a
|
|
|
|
|
row. It must then remain fast enough for the human not to lose too much time,
|
|
|
|
|
and eventually lose patience.
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
\section{General approach}
|
|
|
|
@ -241,7 +297,7 @@ The global strategy used to solve efficiently the problem can be broken down to
|
|
|
|
|
three main parts.
|
|
|
|
|
|
|
|
|
|
\paragraph{Signatures.} The initial idea to make the computation fast is to
|
|
|
|
|
aggregate the inner data of a gate --- be it a leaf gate or a group --- in a
|
|
|
|
|
aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a
|
|
|
|
|
kind of hash, a 64 bits unsigned integer. This approach is directly inspired
|
|
|
|
|
from what was done in fl, back at Intel. This hash must be easy to compute,
|
|
|
|
|
and must be based only on the structure of the graph --- that is, must be
|
|
|
|
@ -280,8 +336,10 @@ this problem, that uses the specificities of the graph to be a little faster.
|
|
|
|
|
|
|
|
|
|
The signature is computed as a simple hash of the element, and is defined for
|
|
|
|
|
every type of expression and circuit. It could probably be enhanced with a bit
|
|
|
|
|
more work to cover more uniformly the hash space, but no collision was observed
|
|
|
|
|
on the examples tested.
|
|
|
|
|
more work to cover more uniformly the hash space, but no illegitimate collision
|
|
|
|
|
(that is, a collision that could be avoided with a better hash function, as
|
|
|
|
|
opposed to collisions due to an equal local graph structure) was observed on
|
|
|
|
|
the examples tested.
|
|
|
|
|
|
|
|
|
|
\paragraph{Signature constants.} Signature constants are used all around the
|
|
|
|
|
signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32
|
|
|
|
@ -289,22 +347,24 @@ bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime
|
|
|
|
|
numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned
|
|
|
|
|
integer); while $a$ is a random integer uniformly picked between $2^{16}$ and
|
|
|
|
|
$2^{32}$. These constants are generated by a small python script,
|
|
|
|
|
\path{util/primegen/pickPrimes.py}.
|
|
|
|
|
\path{util/primegen/pickPrimes.py} in the repository.
|
|
|
|
|
|
|
|
|
|
Those constants are used to produce a 64 bits unsigned value out of another 64
|
|
|
|
|
bits unsigned value, called $v$ thereafter, through an operator $\sigop$,
|
|
|
|
|
computed as follows (with all computations done on 64 bits unsigned integers).
|
|
|
|
|
|
|
|
|
|
\vspace{1em}
|
|
|
|
|
\begin{algorithmic}
|
|
|
|
|
\Function{$\sigop$}{$\sigconst{}, v$}
|
|
|
|
|
\State{} $out1 \gets (v + a) \cdot x_l$
|
|
|
|
|
\State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$
|
|
|
|
|
\State{} $low \gets out1 \,\%\, d_l$
|
|
|
|
|
\State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$
|
|
|
|
|
\State{} \Return{} $low + 2^{32} \cdot high$
|
|
|
|
|
\EndFunction{}
|
|
|
|
|
\end{algorithmic}
|
|
|
|
|
\begin{center}
|
|
|
|
|
\begin{algorithmic}
|
|
|
|
|
\Function{$\sigop$}{$\sigconst{}, v$}
|
|
|
|
|
\State{} $out1 \gets (v + a) \cdot x_l$
|
|
|
|
|
\State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$
|
|
|
|
|
\State{} $low \gets out1 \,\%\, d_l$
|
|
|
|
|
\State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$
|
|
|
|
|
\State{} \Return{} $low + 2^{32} \cdot high$
|
|
|
|
|
\EndFunction{}
|
|
|
|
|
\end{algorithmic}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
|
|
\paragraph{Expressions.} Each type of expression (or, in the case of
|
|
|
|
|
expression with operator, each type of operator) has its signature constant,
|
|
|
|
@ -331,7 +391,7 @@ capture at all the \emph{structure} of the graph. An information we can capture
|
|
|
|
|
without breaking the signature's independence towards the order of description
|
|
|
|
|
of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates
|
|
|
|
|
without breaking this rule; thus, we represent the set of neighbours by the set
|
|
|
|
|
of our \emph{neighbours' signatures}.
|
|
|
|
|
of the \emph{neighbours' signatures}.
|
|
|
|
|
|
|
|
|
|
At this point, we can define the \emph{signature of order $n$} ($n \in
|
|
|
|
|
\natset$) of a circuit $C$ as follows:
|
|
|
|
@ -348,13 +408,13 @@ At this point, we can define the \emph{signature of order $n$} ($n \in
|
|
|
|
|
|
|
|
|
|
The ``IO adjacency'' term is an additional term in the signatures of order
|
|
|
|
|
above $0$, indicating what input and output pins of the circuit group
|
|
|
|
|
containing the current gate are adjacent to it.
|
|
|
|
|
containing the current gate are adjacent to it. Adding this information to the
|
|
|
|
|
signature was necessary, since a lot of gates can be signed differently using
|
|
|
|
|
this information (see Corner cases in Section~\ref{ssec:corner_cases}).
|
|
|
|
|
|
|
|
|
|
The default order of signature used in all computations, unless more is useful,
|
|
|
|
|
is 2, after a few benchmarks.
|
|
|
|
|
|
|
|
|
|
\todo{explain range of $n$}
|
|
|
|
|
|
|
|
|
|
\paragraph{Efficiency.} Every circuit memoizes all it can concerning its
|
|
|
|
|
signature: the inner signature, the IO adjacency, the signatures of order $n$
|
|
|
|
|
already computed, etc.
|
|
|
|
@ -373,8 +433,10 @@ or its children are modified. A memoized data is always stored alongside with a
|
|
|
|
|
timestamp of computation, which invalidates a previous result when needed.
|
|
|
|
|
|
|
|
|
|
One possible path of investigation for future work, if the computation turns
|
|
|
|
|
out to be still too slow in real-world cases --- which looks unlikely ---,
|
|
|
|
|
would be to try to multithread this computation.
|
|
|
|
|
out to be still too slow in real-world cases --- which looks unlikely, unless
|
|
|
|
|
fl's substitution is run on a regular basis for a huge number of cases using
|
|
|
|
|
\eg{} a crontab for automated testing ---, would be to try to multithread this
|
|
|
|
|
computation.
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
\section{Group equality}\label{sec:group_equality}
|
|
|
|
@ -401,7 +463,7 @@ the number of permutations examined to no more than $4$ in studied cases.
|
|
|
|
|
|
|
|
|
|
Once a permutation is judged worth to be examined, the group equality is run
|
|
|
|
|
recursively on all its matched gates. If this step succeeds, the graph
|
|
|
|
|
structure is then checked. If both steps succeeds, the permutation is correct
|
|
|
|
|
structure is then checked. If both steps succeed, the permutation is correct
|
|
|
|
|
and an isomorphism has been found; if not, we move on to the next permutation.
|
|
|
|
|
|
|
|
|
|
\todo{Anything more to tell here?}
|
|
|
|
@ -409,7 +471,11 @@ and an isomorphism has been found; if not, we move on to the next permutation.
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
\section{Pattern-match}
|
|
|
|
|
|
|
|
|
|
\todo{}
|
|
|
|
|
We finally need to be able to find every occurrence of a given \emph{needle}
|
|
|
|
|
circuit in a bigger \emph{haystack} circuit — at any level in its groups
|
|
|
|
|
hierarchy. This problem is basically graph isomorphism with some specificities,
|
|
|
|
|
and for this purpose I used a classical algorithm for the subgraph isomorphism
|
|
|
|
|
problem, \emph{Ullmann}.
|
|
|
|
|
|
|
|
|
|
\subsection{Ullmann's algorithm}
|
|
|
|
|
|
|
|
|
@ -418,7 +484,10 @@ was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}.
|
|
|
|
|
Another, more recent algorithm to deal with this problem is Luigi P Cordella's
|
|
|
|
|
VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is
|
|
|
|
|
mostly Ullmann's algorithm, transcribed in a recursive writing, with the
|
|
|
|
|
addition of five heuristics. \qtodo{Why not use it then?}
|
|
|
|
|
addition of five heuristics. I originally planned to implement both algorithms
|
|
|
|
|
and benchmark both, but had no time to do so in the end; though, Ullmann with
|
|
|
|
|
the few additional heuristics applicable in our very specific case turned out
|
|
|
|
|
to be fast enough.
|
|
|
|
|
|
|
|
|
|
Ullmann is a widely used and fast algorithm for this problem. It makes an
|
|
|
|
|
extensive use of adjacency matrix description of the graph, and the initial
|
|
|
|
@ -430,8 +499,9 @@ matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a
|
|
|
|
|
possible match with the $j$-th haystack part. This matrix is called $perm$
|
|
|
|
|
thereafter.
|
|
|
|
|
|
|
|
|
|
The algorithm, left apart the \textsc{refine} function (detailed just after),
|
|
|
|
|
is described in Figure~\ref{alg:ullmann}.
|
|
|
|
|
The algorithm, left apart the \textsc{refine} function, which is detailed just
|
|
|
|
|
after and can be omitted for a (way) slower version of the algorithm, is
|
|
|
|
|
described in Figure~\ref{alg:ullmann}.
|
|
|
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
|
\begin{algorithmic}
|
|
|
|
@ -474,7 +544,7 @@ is described in Figure~\ref{alg:ullmann}.
|
|
|
|
|
|
|
|
|
|
The refining process is the actual keystone of the algorithm. It is the
|
|
|
|
|
mechanism allowing the algorithm to cut down many exploration branches, by
|
|
|
|
|
removing ones from the matrix.
|
|
|
|
|
changing ones to zeroes in the matrix being built.
|
|
|
|
|
|
|
|
|
|
The idea is that a match between a needle's vertex $i$ and a haystack's vertex
|
|
|
|
|
$j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour
|
|
|
|
@ -552,7 +622,12 @@ occur).
|
|
|
|
|
\subsection{Implementation optimisations}
|
|
|
|
|
|
|
|
|
|
\paragraph{Initial permutation matrix.} The matrix is first filled according to
|
|
|
|
|
the signatures matches. It is then refined a bit more, by making sure that for
|
|
|
|
|
the signatures matches. Note that only signatures of order 0 --- \ie{} the
|
|
|
|
|
inner data of a vertex --- can be used here: indeed, we cannot rely on the
|
|
|
|
|
context here, since there can be some context in the haystack that is absent
|
|
|
|
|
from the needle, and we cannot check for ``context inclusion'' with our
|
|
|
|
|
definition of signatures: \emph{all} the context must be exactly the same for
|
|
|
|
|
two signatures to match. It is then refined a bit more, by making sure that for
|
|
|
|
|
every match, every potentially matching gate has the same ``wire kinds''.
|
|
|
|
|
Indeed, a gate needle's wire must have at least the same inbound adjacent
|
|
|
|
|
signatures as its matching haystack wire, and same goes for outbound adjacent
|
|
|
|
@ -570,15 +645,101 @@ many times in the haystack. This simple check saved a lot of time.
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
\section{Performance}
|
|
|
|
|
|
|
|
|
|
\todo{}
|
|
|
|
|
In this section, all the measures were made on a computer with an Intel
|
|
|
|
|
i7-3770 CPU (3.40GHz) and 8\,GB of RAM\@. % chktex 8
|
|
|
|
|
|
|
|
|
|
\subsection{Corner cases}
|
|
|
|
|
\subsection{Small processor}
|
|
|
|
|
|
|
|
|
|
The example I used widely during my project to test my program and check its
|
|
|
|
|
efficiency periodically was a small processor designed one year earlier as a
|
|
|
|
|
school project~\cite{sysdig_cpu}. The processor implements a large subset of
|
|
|
|
|
ARM, was conceived as a few hierarchized modules (ALU, registers, opcode
|
|
|
|
|
decoder, \ldots) but was flattened as a plain netlist when generated from its
|
|
|
|
|
python code, so I first had to patch its generator code to make its hierarchy
|
|
|
|
|
apparent.
|
|
|
|
|
|
|
|
|
|
The processor, in the end, has around 2000 leaf gates (but works at word level)
|
|
|
|
|
and 240 hierarchy groups. \qtodo{Tell us more!}
|
|
|
|
|
|
|
|
|
|
\paragraph{Signature.} First, the time required to sign the whole circuit with
|
|
|
|
|
different levels of signature (\ie{} the order of signature computed for every
|
|
|
|
|
part of the circuit). In practice, we never compute high order signatures for a
|
|
|
|
|
whole circuit, as signature of subgroups are always computed by default at the
|
|
|
|
|
order $2$, unless this particular group needs a more accurate signature.
|
|
|
|
|
|
|
|
|
|
The measures were made for 100 consecutive runs of the program (then averaged
|
|
|
|
|
for a single run) and measured by the command \texttt{time}.
|
|
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
|
\begin{tikzpicture}
|
|
|
|
|
\begin{axis}[
|
|
|
|
|
title={Signature time of the processor for different levels
|
|
|
|
|
of signature},
|
|
|
|
|
xlabel={Level of signature},
|
|
|
|
|
ylabel={Time (ms)},
|
|
|
|
|
xmin=0, xmax=16,
|
|
|
|
|
ymin=0, ymax=300,
|
|
|
|
|
legend pos=north west,
|
|
|
|
|
ymajorgrids=true,
|
|
|
|
|
grid style=dashed,
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
\addplot[
|
|
|
|
|
color=blue,
|
|
|
|
|
mark=square,
|
|
|
|
|
]
|
|
|
|
|
coordinates {
|
|
|
|
|
(2,105.4)
|
|
|
|
|
(3,122.6)
|
|
|
|
|
(4,140.1)
|
|
|
|
|
(5,155.4)
|
|
|
|
|
(6,171.2)
|
|
|
|
|
(7,183.9)
|
|
|
|
|
(8,198.3)
|
|
|
|
|
(9,211.2)
|
|
|
|
|
(10,224.3)
|
|
|
|
|
(11,236.7)
|
|
|
|
|
(12,248.5)
|
|
|
|
|
(13,259.3)
|
|
|
|
|
(14,271.7)
|
|
|
|
|
(15,281.4)
|
|
|
|
|
};
|
|
|
|
|
\legend{-O3}
|
|
|
|
|
\end{axis}
|
|
|
|
|
\end{tikzpicture}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
|
|
The computation time is more or less linear in in the level of signature
|
|
|
|
|
required, which is coherent with the implementation. In practice, only small
|
|
|
|
|
portions of a circuit will be signed with a high order, which means that we can
|
|
|
|
|
afford really high order signatures (\eg{} 40 or 50, which already means that
|
|
|
|
|
the diameter of the group is 40 or 50) without having a real impact on the
|
|
|
|
|
computation time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\paragraph{Equality.} To test the circuit group equality, a small piece of
|
|
|
|
|
code takes a circuit, scrambles it as much as possible
|
|
|
|
|
--- without altering its structure ---, \eg{} by renaming randomly its parts,
|
|
|
|
|
by randomly changing the order of the circuits and groups, \ldots The circuit
|
|
|
|
|
is then matched with its unaltered counterpart.
|
|
|
|
|
|
|
|
|
|
For the processor described above, it takes about \textbf{477\,ms} to
|
|
|
|
|
prove it equal to its scrambled version, and then the other way around. Yet,
|
|
|
|
|
the memoized results (essentially the signatures) are kept for the second one,
|
|
|
|
|
considerably speeding it up: the same program proving only one way takes about
|
|
|
|
|
\textbf{475\,ms}. \todo{explain why}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Corner cases}\label{ssec:corner_cases}
|
|
|
|
|
|
|
|
|
|
There were a few observed cases where the algorithm tends to be slower on
|
|
|
|
|
certain configurations.
|
|
|
|
|
certain configurations, and a few other such cases that could be fixed.
|
|
|
|
|
|
|
|
|
|
\todo{More corner cases}
|
|
|
|
|
|
|
|
|
|
\todo{Corner case: io pins, io adjacency}
|
|
|
|
|
|
|
|
|
|
\paragraph{Split/merge trees.} A common pattern that tends to slow down the
|
|
|
|
|
algorithm is split/merge trees. Those patterns occur when one wants to merge
|
|
|
|
|
$n$ one bit wires into a single $n$ bits wire, or the other way around.
|
|
|
|
@ -604,6 +765,12 @@ nodes on the layer below cannot be freely exchanged.
|
|
|
|
|
|
|
|
|
|
\todo{Figure describing the problem}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
\section*{Conclusion}
|
|
|
|
|
|
|
|
|
|
\todo{}
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
|
|
|
|
\printbibliography{}
|
|
|
|
|