778 lines
36 KiB
TeX
778 lines
36 KiB
TeX
\documentclass[11pt,a4paper]{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{amsmath}
|
|
\usepackage{amsfonts}
|
|
\usepackage{amssymb}
|
|
\usepackage{graphicx}
|
|
\usepackage{indentfirst}
|
|
\usepackage{enumerate}
|
|
\usepackage{caption}
|
|
\usepackage{algorithmicx}
|
|
%\usepgfplotslibrary{external}
|
|
%\tikzexternalize
|
|
|
|
\usepackage[backend=biber,style=trad-alpha]{biblatex}
|
|
\usepackage[left=2cm,right=2cm,top=2cm,bottom=2cm]{geometry}
|
|
|
|
|
|
% Custom packages
|
|
\usepackage{todo}
|
|
\usepackage{leftrule_theorems}
|
|
\usepackage{my_listings}
|
|
\usepackage{my_hyperref}
|
|
\usepackage{../common/internship}
|
|
\usepackage{../common/math}
|
|
|
|
\usepackage{pgfplots}
|
|
|
|
\bibliography{../common/refs}
|
|
|
|
\title{Pattern-matching and substitution in electronic circuits}
|
|
\author{Théophile Bastian, under supervision of Carl-Johan Seger
|
|
and Mary Sheeran\\
|
|
\small{Chalmers University, Göteborg, Sweden}}
|
|
\date{February~--~June 2017}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
\todo{enhance abstract?}
|
|
|
|
The present report describes and summarizes my 1st year of Master's
|
|
degree's internship at the university of Chalmers, Göteborg, Sweden. I
|
|
worked under supervision of Carl-Johan Seger and Mary Sheeran.
|
|
|
|
The project's goal was to contribute to VossII, a hardware proving tool
|
|
cloning fl, developed at Intel by Carl Seger for internal usage only. It
|
|
led me to work on \emph{pattern-matching} on electronic circuits for a
|
|
\emph{search-and-replace} method, allowing one to apply previously proved
|
|
transformations to a circuit.
|
|
|
|
This problem turns out to be more or less the \emph{subgraph isomorphism
|
|
problem}, which is NP-complete, and must nevertheless be solved efficiently
|
|
on processor-sized circuits on this particular case.
|
|
|
|
During my internship, I developed a C++ library to perform this task that
|
|
will be integrated in VossII, based on a few well-known algorithms as well
|
|
as some ad-hoc heuristics and algorithm tweaks to better match the context
|
|
of circuits. One of my objectives all along was also to keep a clean and
|
|
well-documented codebase, as the tool will have to be maintainable by
|
|
others later.
|
|
\end{abstract}
|
|
|
|
\tableofcontents
|
|
|
|
\pagebreak
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Introduction}
|
|
|
|
In the previous years, verification and proved software has gathered an
|
|
increasing interest in the computer science community, as people realised how
|
|
hard bugs are to track down, and the little confidence they had in their own
|
|
code. But hardware bugs are even more tedious to find and fix, and can easily
|
|
lead to disastrous consequences, as those cannot be patched on existing
|
|
hardware. For instance, the well-known Pentium ``\textsc{fdiv}''
|
|
bug~\cite{pratt1995fdiv} that affected a large number of Pentium processors
|
|
lead to wrong results for some floating point divisions. Intel had to offer to
|
|
replace all the defective CPUs, leading to an announced loss of 475 million
|
|
dollars~\cite{nicely_fdiv}. Even recently, the Skylake and Kaby Lake
|
|
hyperthreading bug had to be patched using microcode, losing performance and
|
|
reliability.
|
|
|
|
To avoid such disasters, the industry nowadays uses a wide range of techniques
|
|
to catch bugs as early as possible --- which, hopefully, is before the
|
|
product's release date. Among those are \todo{list + cite}, but also proved
|
|
hardware. On circuits as complex as processors, usually, only sub-components
|
|
are proved correct with respect to a given specification of its behaviour,
|
|
while the circuit is kept in a specified context, \ie{} a set of valid inputs
|
|
and outputs, etc. --- that should, but is not proved to, be respected by the
|
|
other parts of the circuit. Yet, this trade-off between proved correctness and
|
|
engineer's work time already gives a pretty good confidence in the circuit.
|
|
|
|
In this context, Carl Seger was one of the main developers of fl at Intel, a
|
|
functional ml-inspired programming language integrating many features useful to
|
|
get insights of a circuit, testing it and proving it. It mostly features
|
|
symbolic trajectory evaluation based model checking and theorem proving, and is
|
|
intended to be an all-purpose toolbox for the hardware prover.
|
|
|
|
Among other features, it includes a ``search and replace'' feature, which can
|
|
search every occurrence of a given gates pattern in a circuit, and replace it
|
|
by some other gates pattern, proved observationally equivalent beforehand.
|
|
Time has proved this method very efficient to design circuits: this way, one
|
|
can start from an inefficient, yet simple circuit, prove it, and then refine it
|
|
into an equivalent, yet efficient one, through proved transformations. It is
|
|
also possible to go the other way, and start with an optimized circuit, hard to
|
|
understand, and make it easier to understand to work more efficiently.
|
|
|
|
\paragraph{My internship} lies amid a project shared between Carl-Johan Seger
|
|
and Mary Sheeran, aiming to develop tools for proved design of FPGA circuits.
|
|
One of the keystones of this project is an open-sourced and publicly available
|
|
version of fl, used for the proving part, and is still at the moment under
|
|
heavy development.
|
|
|
|
My part of the work resided on this ``search and replace'' tool. More
|
|
specifically, I focused on writing a C++ library, \emph{isomatch}, which is
|
|
interfaced with the C core of fl and provides it with low-level and very fast
|
|
functions for this task.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Problem setting}
|
|
|
|
\subsection{Circuit description}
|
|
|
|
\begin{figure}[!h]
|
|
\begin{align*}
|
|
\textbf{Integer constant } n, m, p, q, \ldots \qquad& \\
|
|
\\
|
|
\textbf{Wire } in0, out0, ctl0, \ldots \qquad& \\
|
|
\\
|
|
\textbf{Vector } \evec{v}{n} & & \textit{($n$ elements of type $v$)} \\
|
|
\\
|
|
\textbf{Circuit } c, d, \ldots ::=~
|
|
&\text{delay} (in0, out0) &\textit{(delay 1 clock tick)} \\
|
|
\vert~&\text{tristate} (in0, out0, ctl0)
|
|
&\textit{(three-state gate)} \\
|
|
\vert~&\text{comb} (\evec{in0}{n}, \evec{out0}{m}, \evec{e}{m})
|
|
&\textit{(combinatorial gate)} \\
|
|
\vert~&\text{assert} (\evec{in0}{n}, \evec{e}{m})
|
|
&\textit{(assertion gate)} \\
|
|
\vert~&\text{group} (\evec{in0}{n}, \evec{out0}{m}, \evec{c}{p})
|
|
&\textit{(circuit hierarchical group)} \\
|
|
\\
|
|
\textbf{Binary operator } \otimes ::=~
|
|
&\wedge & \textit{(and)} \\
|
|
\vert~&\vee & \textit{(or)} \\
|
|
\vert~&\oplus & \textit{(xor)} \\
|
|
\vert~&+ & \textit{(add)} \\
|
|
\vert~&- & \textit{(sub)} \\
|
|
\vert~&\times & \textit{(times)} \\
|
|
\vert~&\div & \textit{(div)} \\
|
|
\vert~&\% & \textit{(mod)} \\ % chktex 35
|
|
\vert~&\lsl & \textit{(logical shift left)} \\
|
|
\vert~&\lsr & \textit{(logical shift right)} \\
|
|
\vert~&\asr & \textit{(arithmetic shift right)} \\
|
|
\\
|
|
\textbf{Unary and constant operator } \otimes_0 ::=~
|
|
&\clsl & \textit{(logical shift left of constant)} \\
|
|
\vert~&\clsr & \textit{(logical shift right of constant)} \\
|
|
\vert~&\casr & \textit{(arithmetic shift right of constant)} \\
|
|
\\
|
|
\textbf{Unary operator } \ominus ::=~
|
|
&\lnot & \textit{(logical negation)} \\
|
|
\\
|
|
\textbf{Expression } e, f, \ldots ::=~
|
|
& x & \textit{(variable)} \\
|
|
\vert~& n & \textit{(integer constant)} \\
|
|
\vert~& e \otimes f & \textit{(binary operator)} \\
|
|
\vert~& e \otimes_0 n & \textit{(unary operator with constant)} \\
|
|
\vert~& \ominus e & \textit{(unary operator)} \\
|
|
\vert~& e_{\vert~n \ldots m} & \textit{(slicing: take a subword)}\\
|
|
\vert~& e~\vert~f & \textit{(merging: concatenate two words)} \\
|
|
\end{align*}
|
|
\caption{AST of circuits used}\label{fig:ast}
|
|
\end{figure}
|
|
|
|
The circuits on which \emph{isomatch} is working are described, and internally
|
|
represented, by the AST in Figure~\ref{fig:ast}.
|
|
|
|
The most important thing in the description of circuits here, is that those
|
|
circuits are organized as a hierarchy of \emph{circuit groups}. This hierarchy
|
|
can be seen as the construction of a circuit by assembling smaller integrated
|
|
circuits (ICs), themselves built the same way, etc. A group is composed of
|
|
sub-circuits, input pins and output pins. Each level can of course contain
|
|
``leaf'' gates, like \textit{and} or \textit{delay} gates. This is important,
|
|
because it allows the program to work on smaller areas of the circuit (\eg{}
|
|
loading in memory only a part of the circuit, etc.).
|
|
|
|
\emph{Isomatch} comes along with a small parser for a toy ad-hoc language,
|
|
designed to allow one to quickly write a test circuit and run the
|
|
\emph{isomatch} functions on it. There was no real apparent need for a better
|
|
language, or a standard circuit description language (such as VSDL), since the
|
|
user will mostly use \emph{isomatch} through fl, and feed it directly with data
|
|
read from fl --- which is able to handle \eg{} VSDL\@.
|
|
|
|
\subsection{Codebases}
|
|
|
|
Carl Seger's new version of fl is currently being developed as \textbf{VossII},
|
|
and is yet to be open-sourced.
|
|
|
|
My contribution to the project, \textbf{isomatch}, is free software, and is
|
|
available on GitHub:
|
|
|
|
\begin{center}
|
|
\raisebox{-0.4\height}{
|
|
\includegraphics[height=2em]{../common/github32.png}}
|
|
\hspace{1em}
|
|
\url{https://github.com/tobast/circuit-isomatch/}
|
|
\end{center}
|
|
|
|
\subsection{Problems}
|
|
|
|
More precisely, the problems that \emph{isomatch} must solve are the following.
|
|
|
|
\begin{enumerate}
|
|
\item\label{prob:equal} Given two circuit groups, are they structurally
|
|
equivalent? That is, are they the same circuit, arranged in a different
|
|
way, with possibly different names, etc.?
|
|
|
|
\item\label{prob:match} Given two circuits, \emph{needle} and
|
|
\emph{haystack}, find every (non-overlapping) occurrence of
|
|
\emph{needle} in \emph{haystack}. An occurrence is a set $S$ of
|
|
sub-circuits of \emph{haystack} such that there is a one-to-one mapping
|
|
of structurally equivalent circuits of $S$ with circuits of
|
|
\emph{needle}, and those circuits are connected the same way in both circuits.
|
|
\end{enumerate}
|
|
|
|
Both problems are hard. The first one is an instance of the graph isomorphism
|
|
problem, as the actual question is whether there exists a one-to-one mapping
|
|
between sub-circuits of the two groups, such that every mapped circuit is equal
|
|
to the other (either directly if it is a leaf gate, or recursively with the
|
|
same procedure); and whether this mapping respects connections (edges) between
|
|
those circuits. Graph isomorphism is known to be in NP (given a permutation of
|
|
the first graph, it is polynomial to check whether the first is equal to the
|
|
second \wrt{} the permutation), but not known to be in either P or NP-complete.
|
|
Thus, since Babai's work on graph isomorphism~\cite{babai2016graph} is only of
|
|
theoretical interest (at the moment), the known algorithms remain in worst-case
|
|
exponential time, and require ad-hoc heuristics for specific kind of graphs to
|
|
get maximum efficiency.
|
|
|
|
The second one is an instance of subgraph isomorphism problem, which is known
|
|
to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
|
|
(discussed later) are known to be efficient in most cases for this problem, it
|
|
is nevertheless necessary to implement them the right way, and with the right
|
|
heuristics, to get the desired efficiency for the given problem.
|
|
|
|
\subsection{Code quality}
|
|
|
|
Another prominent objective was to keep the codebase as clean as possible.
|
|
Indeed, this code will probably have to be maintained for quite some time, and
|
|
most probably by other people than me. This means that the code and all its
|
|
surroundings must be really clean, readable and reusable. I tried to put a lot
|
|
of effort in making the code idiomatic and easy to use, through \eg{} the
|
|
implementation of iterators over my data structures when needed, idiomatic
|
|
C++14, etc.
|
|
|
|
This also means that the code has to be well-documented: the git history had to
|
|
be kept clean and understandable, and a clean documentation can be generated
|
|
from the code, using \texttt{doxygen}. The latest documentation is also
|
|
compiled as HTML pages here:
|
|
|
|
\begin{center}
|
|
\raisebox{-0.4\height}{
|
|
\includegraphics[height=2.3em]{../common/docs.png}}
|
|
\hspace{1em}
|
|
\url{https://tobast.fr/m1/isomatch}
|
|
\end{center}
|
|
|
|
Since the code is C++, it is also very prone to diverse bugs. While I did not
|
|
took the time to integrate unit tests --- which would have been a great
|
|
addition ---, I used a sequence of test that can be run using \lstc{make
|
|
test}, and tests a lot of features of isomatch.
|
|
|
|
The code is also tested regularly and on a wide variety of cases with
|
|
\lstbash{valgrind} to ensure that there are no memory errors ---
|
|
use-after-free, unallocated memory, memory leaks, bad pointer
|
|
arithmetics,~\ldots In every tested case, strictly no memory is lost, and no
|
|
invalid read was reported.
|
|
|
|
\subsection{Sought efficiency}
|
|
|
|
The goal of \textit{isomatch} is to be applied to large circuits on-the-fly,
|
|
during their conception. Those circuits can (and will probably) be as large as
|
|
a full processor, and the software will be operated by a human, working on
|
|
their circuit. Thus, \textit{isomatch} must be as fast as possible, since
|
|
matching operations will be executed quite often, and often multiple times in a
|
|
row. It must then remain fast enough for the human not to lose too much time,
|
|
and eventually lose patience.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{General approach}
|
|
|
|
The global strategy used to solve efficiently the problem can be broken down to
|
|
three main parts.
|
|
|
|
\paragraph{Signatures.} The initial idea to make the computation fast is to
|
|
aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a
|
|
kind of hash, a 64 bits unsigned integer. This approach is directly inspired
|
|
from what was done in fl, back at Intel. This hash must be easy to compute,
|
|
and must be based only on the structure of the graph --- that is, must be
|
|
entirely oblivious of the labels given, the order in which the circuit is
|
|
described, the order in which different circuits are plugged on a wire, \ldots.
|
|
The signature equality, moreover, must be sound; that is, two signatures must
|
|
necessarily be equal if the circuits are indeed equal.
|
|
|
|
This makes it possible to rule out quickly whether two circuits are candidates
|
|
for a match or not, and run the costy actual equality algorithm on fewer gates.
|
|
|
|
|
|
\paragraph{Group equality.} The group equality algorithm is a standard
|
|
backtracking algorithm. It tries to build a match between the graphs by
|
|
trying the diverse permutations of elements with the same signature. It can
|
|
also communicate with the signing part, to request a more precise (but slightly
|
|
slower to compute) signature when it has too many permutations to try.
|
|
|
|
This part could be enhanced, but does not slow down the algorithm on the tested
|
|
examples, so I focused on other parts.
|
|
|
|
|
|
\paragraph{Pattern matching.} This part is the one responsible to answer
|
|
queries for occurrences of a sub-circuit in a circuit. It uses extensively the
|
|
signatures to determine whether two circuits could be a match or not before
|
|
spending too much time actually finding matches, but cannot rely on it as
|
|
heavily as group equality, since only the first level of precision is
|
|
applicable here (detailed later).
|
|
|
|
This part mostly consists in an implementation of Ullmann's algorithm for
|
|
subgraph isomorphism~\cite{ullmann1976algorithm}, a well-known algorithm for
|
|
this problem, that uses the specificities of the graph to be a little faster.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Signatures}
|
|
|
|
The signature is computed as a simple hash of the element, and is defined for
|
|
every type of expression and circuit. It could probably be enhanced with a bit
|
|
more work to cover more uniformly the hash space, but no illegitimate collision
|
|
(that is, a collision that could be avoided with a better hash function, as
|
|
opposed to collisions due to an equal local graph structure) was observed on
|
|
the examples tested.
|
|
|
|
\paragraph{Signature constants.} Signature constants are used all around the
|
|
signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32
|
|
bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime
|
|
numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned
|
|
integer); while $a$ is a random integer uniformly picked between $2^{16}$ and
|
|
$2^{32}$. These constants are generated by a small python script,
|
|
\path{util/primegen/pickPrimes.py} in the repository.
|
|
|
|
Those constants are used to produce a 64 bits unsigned value out of another 64
|
|
bits unsigned value, called $v$ thereafter, through an operator $\sigop$,
|
|
computed as follows (with all computations done on 64 bits unsigned integers).
|
|
|
|
\vspace{1em}
|
|
\begin{center}
|
|
\begin{algorithmic}
|
|
\Function{$\sigop$}{$\sigconst{}, v$}
|
|
\State{} $out1 \gets (v + a) \cdot x_l$
|
|
\State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$
|
|
\State{} $low \gets out1 \,\%\, d_l$
|
|
\State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$
|
|
\State{} \Return{} $low + 2^{32} \cdot high$
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\end{center}
|
|
|
|
\paragraph{Expressions.} Each type of expression (or, in the case of
|
|
expression with operator, each type of operator) has its signature constant,
|
|
$\sigconst{\text{exprType}}$. The signature of a commutative expression in its
|
|
operands is always commutative, and the signature of a non-commutative
|
|
expression should not be (and is not, except for collisions). The value $v$
|
|
used to sign the expression (in $\sigop(\sigconst{\text{exprType}}, v)$) is
|
|
then the sum (respectively difference) of the signature of its parameters for
|
|
commutative (respectively non-commutative) expressions.
|
|
|
|
\paragraph{Circuits' inner signature.} Every circuit is associated with a value
|
|
describing its \emph{type} (rather than its contents): 8 bits of circuit type
|
|
ID (delay, tristate, \ldots), the number of inputs on the next 8 bits, and the
|
|
number of outputs on 8 more bits. This value is then xored with the inner value
|
|
of the circuit: for a combinatorial gate, the xor of its expressions'
|
|
signatures; for a group, the sum of its children's signatures\footnote{As a
|
|
group is likely to have multiple occurrences of a single identical circuit, it
|
|
would be unwise to xor its children's signatures, even though the usual advice
|
|
is to combine hashes by xoring them.}, \ldots This value constitutes the
|
|
circuit's \emph{inner signature}.
|
|
|
|
\paragraph{Circuits' signature of order $n$.} The inner signature does not
|
|
capture at all the \emph{structure} of the graph. An information we can capture
|
|
without breaking the signature's independence towards the order of description
|
|
of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates
|
|
without breaking this rule; thus, we represent the set of neighbours by the set
|
|
of the \emph{neighbours' signatures}.
|
|
|
|
At this point, we can define the \emph{signature of order $n$} ($n \in
|
|
\natset$) of a circuit $C$ as follows:
|
|
|
|
\begin{align*}
|
|
\sig_0(C) :&= \text{inner signature of } C\\
|
|
\sig_{n+1}(C) :&= \text{inner signature of }C
|
|
+ \text{IO adjacency}
|
|
+ \hspace{-2em}\sum\limits_{C_i \in \,\text{neighbours of inputs}}
|
|
\hspace{-2em}\sig_n(C_i) \hspace{1em}
|
|
- \hspace{-2em}\sum\limits_{C_o \in \,\text{neighbours of inputs}}
|
|
\hspace{-2em}\sig_n(C_o)
|
|
\end{align*}
|
|
|
|
The ``IO adjacency'' term is an additional term in the signatures of order
|
|
above $0$, indicating what input and output pins of the circuit group
|
|
containing the current gate are adjacent to it. Adding this information to the
|
|
signature was necessary, since a lot of gates can be signed differently using
|
|
this information (see Corner cases in Section~\ref{ssec:corner_cases}).
|
|
|
|
The default order of signature used in all computations, unless more is useful,
|
|
is 2, after a few benchmarks.
|
|
|
|
\paragraph{Efficiency.} Every circuit memoizes all it can concerning its
|
|
signature: the inner signature, the IO adjacency, the signatures of order $n$
|
|
already computed, etc.
|
|
|
|
This memoization, alongside with the exclusive use of elementary operations,
|
|
makes the computation of a signature very fast. The computation is linear in
|
|
the number of gates in a circuit, times the order computed; the computation is
|
|
lazy.
|
|
|
|
To keep those memoized values up to date whenever the structure of the circuit
|
|
is changed (since this is meant to be integrated in a programming language, fl,
|
|
meaning the structure of the circuit will possibly be created, checked for
|
|
signature, altered, then checked again), each circuit keeps track of a
|
|
``timestamp'' of last modification, which is incremented whenever the circuit
|
|
or its children are modified. A memoized data is always stored alongside with a
|
|
timestamp of computation, which invalidates a previous result when needed.
|
|
|
|
One possible path of investigation for future work, if the computation turns
|
|
out to be still too slow in real-world cases --- which looks unlikely, unless
|
|
fl's substitution is run on a regular basis for a huge number of cases using
|
|
\eg{} a crontab for automated testing ---, would be to try to multithread this
|
|
computation.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Group equality}\label{sec:group_equality}
|
|
|
|
Given two circuit group gates, the task of group equality is to determine
|
|
whether the two groups are structurally equivalent, as discussed above.
|
|
|
|
Group equality itself is handled as a simple backtracking algorithm, trying to
|
|
establish a match (an isomorphism, that is, a permutation of the gates of one
|
|
of the groups) between the two groups given.
|
|
|
|
The gates of the two groups are matched by equal signatures, equal number of
|
|
inputs and outputs, based on the signature of default order (that is, 2). A few
|
|
checks are made, \eg{} every matching group must have the same size on both
|
|
sides (if not, then, necessary, the two groups won't match). Then, the worst
|
|
case of number of permutations to check is evaluated.
|
|
|
|
If this number is too high, the signature order will be incremented, and the
|
|
matching groups re-created accordingly, until a satisfyingly low number of
|
|
permutations is reached (or the diameter of the circuit is reached, meaning
|
|
that increasing the order of signature won't have any additional impact). This
|
|
order increase ``on-demand'' proved itself very efficient, effectively lowering
|
|
the number of permutations examined to no more than $4$ in studied cases.
|
|
|
|
Once a permutation is judged worth to be examined, the group equality is run
|
|
recursively on all its matched gates. If this step succeeds, the graph
|
|
structure is then checked. If both steps succeed, the permutation is correct
|
|
and an isomorphism has been found; if not, we move on to the next permutation.
|
|
|
|
\todo{Anything more to tell here?}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Pattern-match}
|
|
|
|
We finally need to be able to find every occurrence of a given \emph{needle}
|
|
circuit in a bigger \emph{haystack} circuit — at any level in its groups
|
|
hierarchy. This problem is basically graph isomorphism with some specificities,
|
|
and for this purpose I used a classical algorithm for the subgraph isomorphism
|
|
problem, \emph{Ullmann}.
|
|
|
|
\subsection{Ullmann's algorithm}
|
|
|
|
One of the classical algorithms to deal with the subgraph isomorphism problem
|
|
was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}.
|
|
Another, more recent algorithm to deal with this problem is Luigi P Cordella's
|
|
VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is
|
|
mostly Ullmann's algorithm, transcribed in a recursive writing, with the
|
|
addition of five heuristics. I originally planned to implement both algorithms
|
|
and benchmark both, but had no time to do so in the end; though, Ullmann with
|
|
the few additional heuristics applicable in our very specific case turned out
|
|
to be fast enough.
|
|
|
|
Ullmann is a widely used and fast algorithm for this problem. It makes an
|
|
extensive use of adjacency matrix description of the graph, and the initial
|
|
article takes advantage of the representation of those matrices as bitsets to
|
|
make extensive use of bitwise operations.
|
|
|
|
The to-be-built permutation matrix is a $\card{needle} \times \card{haystack}$
|
|
matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a
|
|
possible match with the $j$-th haystack part. This matrix is called $perm$
|
|
thereafter.
|
|
|
|
The algorithm, left apart the \textsc{refine} function, which is detailed just
|
|
after and can be omitted for a (way) slower version of the algorithm, is
|
|
described in Figure~\ref{alg:ullmann}.
|
|
|
|
\begin{figure}[h]
|
|
\begin{algorithmic}
|
|
\Function{find\_at\_depth}{depth, perm, freeVert}
|
|
\If{no 1s on \lstc{perm[depth]}}
|
|
\State{} \Return{}
|
|
\EndIf{}
|
|
|
|
\State{} Save perm
|
|
|
|
\For{$0 \leq$ chosen $< \card{\text{haystack}}$ such that
|
|
\lstc{perm[depth][chosen]} $ = 1$ and \lstc{freeVert[chosen]}}
|
|
\State{} Put $0$s everywhere on \lstc{perm[depth]}, but on
|
|
\lstc{chosen}
|
|
|
|
\State{} Refine perm
|
|
\If{a row of perm has only $0$s}
|
|
\State{} \Return{}
|
|
\EndIf{}
|
|
|
|
\If{depth $=$ $\card{\text{needle}} - 1$}
|
|
\State{} Store perm as a result
|
|
\Else{}
|
|
\State{} \Call{find\_at\_depth}{depth$+1$, perm, freeVert with
|
|
freeVert[chosen] $= 0$}
|
|
\EndIf{}
|
|
|
|
\State{} Restore perm
|
|
\EndFor{}
|
|
\EndFunction{}
|
|
|
|
\vspace{1em}
|
|
|
|
\Function{find}{perm}
|
|
\State{} \Return{} \Call{find\_at\_depth}{0, perm, [$1, \ldots, 1$]}
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\caption{Ullmann's algorithm (without refining)}\label{alg:ullmann}
|
|
\end{figure}
|
|
|
|
The refining process is the actual keystone of the algorithm. It is the
|
|
mechanism allowing the algorithm to cut down many exploration branches, by
|
|
changing ones to zeroes in the matrix being built.
|
|
|
|
The idea is that a match between a needle's vertex $i$ and a haystack's vertex
|
|
$j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour
|
|
$k'$ such that the permutation matrix has a one in position $(k, k')$. In other
|
|
words, a match between $i$ and $j$ is only possible if every neighbour $k$ of
|
|
$i$ (in needle) has a possibly matching (\wrt{} $perm$) vertex $k'$ (in
|
|
haystack) which is a neighbour of $j$.
|
|
|
|
This condition is checked on every $1$ in the permutation matrix. If it is not
|
|
met, the cell is nulled. This, though, potentially creates new ones not
|
|
matching the condition: the process must be run again, until no new zeroes
|
|
appear.
|
|
|
|
In the initial article~\cite{ullmann1976algorithm}, Ullmann advocates for
|
|
bitwise tricks to complete this expensive step: indeed, checking the existence
|
|
of such a $k'$ can be done by checking the nullity of the bitwise \textsc{and}
|
|
of the adjacency of $j$ and the permutation matrix row of $k$.
|
|
|
|
The refining function is detailed in Figure~\ref{alg:ullmann_refine}.
|
|
|
|
\todo{Insert explaining figure}
|
|
|
|
\begin{figure}[h]
|
|
\begin{algorithmic}
|
|
\Function{refine}{perm}
|
|
\While{changes during last run}
|
|
\For{each needle vertex $i$}
|
|
\For{each haystack vertex $j$}
|
|
\For{each neighbour $k$ of $i$ in needle}
|
|
\If{\lstc{perm[k] & haystack\_adjacency[j]}
|
|
$= 0$}
|
|
\State{} \lstc{perm[i][j]} $\gets 0$
|
|
\EndIf{}
|
|
\EndFor{}
|
|
\EndFor{}
|
|
\EndFor{}
|
|
\EndWhile{}
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\caption{Ullmann's refining function}\label{alg:ullmann_refine}
|
|
\end{figure}
|
|
|
|
\subsection{Ullmann for \emph{isomatch}}
|
|
|
|
\paragraph{Graph used.} Our circuit is not actually a graph just as-is: indeed,
|
|
a wire can be connected to multiple circuits (multiple gates' inputs, or even
|
|
multiple gates' outputs when using tristate circuits). This could be transposed
|
|
into a graph with $\frac{n(n-1)}{2}$ edges (the complete subgraph) for this
|
|
particular wire. Though, internally, a wire is better represented as a vertex
|
|
itself, with $n$ edges linking it to the connected gates. This representation
|
|
is also used in Ullmann's implementation, leading to a permutation matrix of
|
|
$\left(\card{\text{needle gates}} + \card{\text{needle wires}}\right) \times
|
|
\left(\card{\text{haystack gates}} + \card{\text{haystack wires}}\right)$.
|
|
|
|
\paragraph{Final result.} Once a result (\ie{} a correct permutation) is
|
|
obtained, we further need to check it is actually a solution of our problem.
|
|
Indeed, while the structure is guaranteed by the algorithm to be the same, we
|
|
still need to check that every circuit is equal to its matched one, through the
|
|
procedure described in Section~\ref{sec:group_equality}. So far, only the
|
|
equality of signatures was checked. We only need to check the circuits, as the
|
|
wires are necessarily actually matching.
|
|
|
|
\paragraph{Non-overlapping results.} We want our results to be non-overlapping
|
|
(because we won't be able to perform a search-and-replace if it is not the
|
|
case). Whenever two potential results are conflicting, an arbitrary one of the
|
|
two can be returned (a human user is operating the software and can make a
|
|
narrower search if needed).
|
|
|
|
To match this specification, we must keep track of the circuits that are
|
|
already included in a match. We also cannot include an ancestor of a circuit
|
|
that was included in a match in another match (though this is not possible,
|
|
because the needle can't be included in itself, but signature collisions could
|
|
occur).
|
|
|
|
\subsection{Implementation optimisations}
|
|
|
|
\paragraph{Initial permutation matrix.} The matrix is first filled according to
|
|
the signatures matches. Note that only signatures of order 0 --- \ie{} the
|
|
inner data of a vertex --- can be used here: indeed, we cannot rely on the
|
|
context here, since there can be some context in the haystack that is absent
|
|
from the needle, and we cannot check for ``context inclusion'' with our
|
|
definition of signatures: \emph{all} the context must be exactly the same for
|
|
two signatures to match. It is then refined a bit more, by making sure that for
|
|
every match, every potentially matching gate has the same ``wire kinds''.
|
|
Indeed, a gate needle's wire must have at least the same inbound adjacent
|
|
signatures as its matching haystack wire, and same goes for outbound adjacent
|
|
signatures. Thus, two circuits cannot be matched if this condition is not
|
|
respected for each pair of corresponding wires of those circuits, and their
|
|
corresponding cell in the permutation matrix can be nulled.
|
|
|
|
\paragraph{Pre-check.} The needle will, in most cases, not be found at all in
|
|
a given hierarchy group of the haystack. To avoid wasting computation time, we
|
|
first check that every signature present in the needle is present at least as
|
|
many times in the haystack. This simple check saved a lot of time.
|
|
|
|
\todo{More stuff?}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Performance}
|
|
|
|
In this section, all the measures were made on a computer with an Intel
|
|
i7-3770 CPU (3.40GHz) and 8\,GB of RAM\@. % chktex 8
|
|
|
|
\subsection{Small processor}
|
|
|
|
The example I used widely during my project to test my program and check its
|
|
efficiency periodically was a small processor designed one year earlier as a
|
|
school project~\cite{sysdig_cpu}. The processor implements a large subset of
|
|
ARM, was conceived as a few hierarchized modules (ALU, registers, opcode
|
|
decoder, \ldots) but was flattened as a plain netlist when generated from its
|
|
python code, so I first had to patch its generator code to make its hierarchy
|
|
apparent.
|
|
|
|
The processor, in the end, has around 2000 leaf gates (but works at word level)
|
|
and 240 hierarchy groups. \qtodo{Tell us more!}
|
|
|
|
\paragraph{Signature.} First, the time required to sign the whole circuit with
|
|
different levels of signature (\ie{} the order of signature computed for every
|
|
part of the circuit). In practice, we never compute high order signatures for a
|
|
whole circuit, as signature of subgroups are always computed by default at the
|
|
order $2$, unless this particular group needs a more accurate signature.
|
|
|
|
The measures were made for 100 consecutive runs of the program (then averaged
|
|
for a single run) and measured by the command \texttt{time}.
|
|
|
|
\begin{center}
|
|
\begin{tikzpicture}
|
|
\begin{axis}[
|
|
title={Signature time of the processor for different levels
|
|
of signature},
|
|
xlabel={Level of signature},
|
|
ylabel={Time (ms)},
|
|
xmin=0, xmax=16,
|
|
ymin=0, ymax=300,
|
|
legend pos=north west,
|
|
ymajorgrids=true,
|
|
grid style=dashed,
|
|
]
|
|
|
|
\addplot[
|
|
color=blue,
|
|
mark=square,
|
|
]
|
|
coordinates {
|
|
(2,105.4)
|
|
(3,122.6)
|
|
(4,140.1)
|
|
(5,155.4)
|
|
(6,171.2)
|
|
(7,183.9)
|
|
(8,198.3)
|
|
(9,211.2)
|
|
(10,224.3)
|
|
(11,236.7)
|
|
(12,248.5)
|
|
(13,259.3)
|
|
(14,271.7)
|
|
(15,281.4)
|
|
};
|
|
\legend{-O3}
|
|
\end{axis}
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
|
|
The computation time is more or less linear in in the level of signature
|
|
required, which is coherent with the implementation. In practice, only small
|
|
portions of a circuit will be signed with a high order, which means that we can
|
|
afford really high order signatures (\eg{} 40 or 50, which already means that
|
|
the diameter of the group is 40 or 50) without having a real impact on the
|
|
computation time.
|
|
|
|
|
|
\paragraph{Equality.} To test the circuit group equality, a small piece of
|
|
code takes a circuit, scrambles it as much as possible
|
|
--- without altering its structure ---, \eg{} by renaming randomly its parts,
|
|
by randomly changing the order of the circuits and groups, \ldots The circuit
|
|
is then matched with its unaltered counterpart.
|
|
|
|
For the processor described above, it takes about \textbf{477\,ms} to
|
|
prove it equal to its scrambled version, and then the other way around. Yet,
|
|
the memoized results (essentially the signatures) are kept for the second one,
|
|
considerably speeding it up: the same program proving only one way takes about
|
|
\textbf{475\,ms}. \todo{explain why}
|
|
|
|
|
|
|
|
\subsection{Corner cases}\label{ssec:corner_cases}
|
|
|
|
There were a few observed cases where the algorithm tends to be slower on
|
|
certain configurations, and a few other such cases that could be fixed.
|
|
|
|
\todo{More corner cases}
|
|
|
|
\todo{Corner case: io pins, io adjacency}
|
|
|
|
\paragraph{Split/merge trees.} A common pattern that tends to slow down the
|
|
algorithm is split/merge trees. Those patterns occur when one wants to merge
|
|
$n$ one bit wires into a single $n$ bits wire, or the other way around.
|
|
|
|
These patterns are pretty common, for instance when an opcode is run through a
|
|
MUX tree to perform the requested operation.
|
|
|
|
Though, this pattern generates a lot of collisions in signatures. Indeed, for a
|
|
tree of depth \eg{} 8, a node just below the root will need a signature of
|
|
order 7 to have a different signature than another one at the same depth. With
|
|
a signature of order up to 6, only other gates from the tree will be included
|
|
in the signature when going down in the tree; the exact same gates will be
|
|
included above the tree's root. Thus, nothing will differentiate one gate from
|
|
another while the boundary of the tree is not reached (assuming the gates below
|
|
the tree's leaves are not all the same; if so, more levels will be needed).
|
|
|
|
As the notion of ``left child'' and ``right child'' cannot be used (since it
|
|
would rely on the order or description of the graph), there seems to be no good
|
|
way to discriminate those two nodes. Furthermore, the nodes are not totally
|
|
interchangeable: indeed, when checking for an equality between two such trees,
|
|
it does not matter which node is the left one; but once this is fixed, the
|
|
nodes on the layer below cannot be freely exchanged.
|
|
|
|
\todo{Figure describing the problem}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section*{Conclusion}
|
|
|
|
\todo{}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\printbibliography{}
|
|
|
|
\end{document}
|