950 lines
45 KiB
TeX
950 lines
45 KiB
TeX
\documentclass[11pt,a4paper]{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{amsmath}
|
|
\usepackage{amsfonts}
|
|
\usepackage{amssymb}
|
|
\usepackage{graphicx}
|
|
\usepackage{indentfirst}
|
|
\usepackage{enumerate}
|
|
\usepackage{caption}
|
|
\usepackage{algorithmicx}
|
|
%\usepgfplotslibrary{external}
|
|
%\tikzexternalize
|
|
|
|
\usepackage[backend=biber,style=trad-alpha]{biblatex}
|
|
\usepackage[left=2cm,right=2cm,top=2cm,bottom=2cm]{geometry}
|
|
|
|
|
|
% Custom packages
|
|
\usepackage{todo}
|
|
\usepackage{leftrule_theorems}
|
|
\usepackage{my_listings}
|
|
\usepackage{my_hyperref}
|
|
\usepackage{../common/internship}
|
|
\usepackage{../common/math}
|
|
|
|
\usepackage{pgfplots}
|
|
|
|
\bibliography{../common/refs}
|
|
|
|
\title{Pattern-matching and substitution in electronic circuits}
|
|
\author{Théophile Bastian, under supervision of Carl-Johan Seger
|
|
and Mary Sheeran\\
|
|
\small{Chalmers University, Göteborg, Sweden}}
|
|
\date{February~--~June 2017}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
The present report describes and summarizes my 1st year of Master's
|
|
degree's internship at the university of Chalmers, Göteborg, Sweden. I
|
|
worked under supervision of Carl-Johan Seger and Mary Sheeran.
|
|
|
|
The project's goal was to contribute to VossII, a hardware proving tool
|
|
cloning fl, developed at Intel by Carl Seger for internal usage only. It
|
|
led me to work on \emph{pattern-matching} on electronic circuits for a
|
|
\emph{search-and-replace} method, allowing one to apply previously proved
|
|
transformations to a circuit.
|
|
|
|
This problem turns out to be more or less the \emph{subgraph isomorphism
|
|
problem}, which is NP-complete, and must nevertheless be solved efficiently
|
|
on processor-sized circuits on this particular case.
|
|
|
|
During my internship, I developed a C++ library to perform this task that
|
|
will be integrated in VossII, based on a few well-known algorithms as well
|
|
as some ad-hoc heuristics and algorithm tweaks to better match the context
|
|
of circuits. One of my objectives all along was also to keep a clean and
|
|
well-documented codebase, as the tool will have to be maintainable by
|
|
others later.
|
|
\end{abstract}
|
|
|
|
\tableofcontents
|
|
|
|
\pagebreak
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Introduction}
|
|
|
|
In the previous years, verification and proved software has gathered an
|
|
increasing interest in the computer science community, as people realised how
|
|
hard bugs are to track down, and the little confidence they had in their own
|
|
code. But hardware bugs are even more tedious to find and fix, and can easily
|
|
lead to disastrous consequences, as those cannot be patched on existing
|
|
hardware. For instance, the well-known Pentium ``\textsc{fdiv}''
|
|
bug~\cite{pratt1995fdiv} that affected a large number of Pentium processors
|
|
lead to wrong results for some floating point divisions. Intel had to offer to
|
|
replace all the defective CPUs, leading to an announced loss of 475 million
|
|
dollars~\cite{nicely_fdiv}. Even recently, the Skylake and Kaby Lake
|
|
hyperthreading bug had to be patched using microcode, losing performance and
|
|
reliability.
|
|
|
|
To avoid such disasters, the industry nowadays uses a wide range of techniques
|
|
to catch bugs as early as possible --- which, hopefully, is before the
|
|
product's release date. These techniques include of course a lot of testing on
|
|
simulated hardware or FPGAs (since an actual processor is extremely expensive
|
|
to burn). A lot of testing is run as a routine on the current version
|
|
of the hardware, to catch and notify the designers, since it remains the
|
|
easiest way to test the behaviour of a circuit. Symbolic trajectory
|
|
evaluation~\cite{hazelhurst1997symbolic} has also its place in the domain,
|
|
allowing one to run a circuit on a few cycles (before it becomes too expensive)
|
|
with symbolic values, \ie{} variables instead of zeroes, ones and $X$s (for
|
|
``not a value''). This kind of testing is way more powerful than plain testing,
|
|
since its results are more precise; yet it is also too expensive to run on a
|
|
significantly long number of cycles, and therefore a lot of bugs are impossible
|
|
to catch this way.
|
|
|
|
The previous methods are great cheap strategies to run the first tests on a
|
|
circuit, but give only little confidence in its correction --- it only proves
|
|
that among all the cases that were tested, all yielded a correct behaviour.
|
|
These reasons led to the development of proved hardware in the industry. On
|
|
circuits as complex as processors, usually, only sub-components are proved
|
|
correct with respect to a given specification of its behaviour (usually source
|
|
code that should behave as the processor is expected to behave, itself with
|
|
respect to the written documentation draft of the circuit). These proofs are
|
|
typically valid only while the circuit is kept in a specified context, \ie{} a
|
|
set of valid inputs, tensions, etc. --- that should, but is not proved to, be
|
|
respected by the other parts of the circuit. Yet, this trade-off between proved
|
|
correctness and engineer's work time already gives a pretty good confidence in
|
|
the circuit.
|
|
|
|
In this context, Carl Seger was one of the main developers of fl at
|
|
Intel~\cite{seger1993vos}~\cite{seger2005industrially}~\cite{seger2006design},
|
|
a functional ml-inspired programming language integrating many features useful
|
|
to get insights of a circuit, testing it and proving it. It mostly features
|
|
symbolic trajectory evaluation based model checking and theorem proving, and is
|
|
intended to be an all-purpose toolbox for the hardware prover.
|
|
|
|
Among other features, it includes a ``search and replace'' feature, which can
|
|
search every occurrence of a given gates pattern in a circuit, and replace it
|
|
by some other gates pattern, proved observationally equivalent beforehand.
|
|
Time has proved this method very efficient to design circuits: this way, one
|
|
can start from an inefficient, yet simple circuit, prove it, and then refine it
|
|
into an equivalent, yet efficient one, through proved transformations. It is
|
|
also possible to go the other way, and start with an optimized circuit, hard to
|
|
understand, and make it easier to understand to work more efficiently.
|
|
|
|
\paragraph{My internship} lies amid a project shared between Carl-Johan Seger
|
|
and Mary Sheeran, aiming to develop tools for proved design of FPGA circuits.
|
|
One of the keystones of this project is an open-sourced and publicly available
|
|
version of fl, used for the proving part, and is still at the moment under
|
|
heavy development.
|
|
|
|
My part of the work resided on this ``search and replace'' tool. More
|
|
specifically, I focused on writing a C++ library, \emph{isomatch}, which is
|
|
interfaced with the C core of fl and provides it with low-level and very fast
|
|
functions for this task.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Problem setting}
|
|
|
|
\subsection{Circuit description}
|
|
|
|
\begin{figure}[!h]
|
|
\begin{align*}
|
|
\textbf{Integer constant } n, m, p, q, \ldots \qquad& \\
|
|
\\
|
|
\textbf{Wire } in0, out0, ctl0, \ldots \qquad& \\
|
|
\\
|
|
\textbf{Vector } \evec{v}{n} & & \textit{($n$ elements of type $v$)} \\
|
|
\\
|
|
\textbf{Circuit } c, d, \ldots ::=~
|
|
&\text{delay} (in0, out0) &\textit{(delay 1 clock tick)} \\
|
|
\vert~&\text{tristate} (in0, out0, ctl0)
|
|
&\textit{(three-state gate)} \\
|
|
\vert~&\text{comb} (\evec{in0}{n}, \evec{out0}{m}, \evec{e}{m})
|
|
&\textit{(combinatorial gate)} \\
|
|
\vert~&\text{assert} (\evec{in0}{n}, \evec{e}{m})
|
|
&\textit{(assertion gate)} \\
|
|
\vert~&\text{group} (\evec{in0}{n}, \evec{out0}{m}, \evec{c}{p})
|
|
&\textit{(circuit hierarchical group)} \\
|
|
\\
|
|
\textbf{Binary operator } \otimes ::=~
|
|
&\wedge & \textit{(and)} \\
|
|
\vert~&\vee & \textit{(or)} \\
|
|
\vert~&\oplus & \textit{(xor)} \\
|
|
\vert~&+ & \textit{(add)} \\
|
|
\vert~&- & \textit{(sub)} \\
|
|
\vert~&\times & \textit{(times)} \\
|
|
\vert~&\div & \textit{(div)} \\
|
|
\vert~&\% & \textit{(mod)} \\ % chktex 35
|
|
\vert~&\lsl & \textit{(logical shift left)} \\
|
|
\vert~&\lsr & \textit{(logical shift right)} \\
|
|
\vert~&\asr & \textit{(arithmetic shift right)} \\
|
|
\\
|
|
\textbf{Unary and constant operator } \otimes_0 ::=~
|
|
&\clsl & \textit{(logical shift left of constant)} \\
|
|
\vert~&\clsr & \textit{(logical shift right of constant)} \\
|
|
\vert~&\casr & \textit{(arithmetic shift right of constant)} \\
|
|
\\
|
|
\textbf{Unary operator } \ominus ::=~
|
|
&\lnot & \textit{(logical negation)} \\
|
|
\\
|
|
\textbf{Expression } e, f, \ldots ::=~
|
|
& x & \textit{(variable)} \\
|
|
\vert~& n & \textit{(integer constant)} \\
|
|
\vert~& e \otimes f & \textit{(binary operator)} \\
|
|
\vert~& e \otimes_0 n & \textit{(unary operator with constant)} \\
|
|
\vert~& \ominus e & \textit{(unary operator)} \\
|
|
\vert~& e_{\vert~n \ldots m} & \textit{(slicing: take a subword)}\\
|
|
\vert~& e~\vert~f & \textit{(merging: concatenate two words)} \\
|
|
\end{align*}
|
|
\caption{AST of circuits used}\label{fig:ast}
|
|
\end{figure}
|
|
|
|
The circuits on which \emph{isomatch} is working are described, and internally
|
|
represented, by the AST in Figure~\ref{fig:ast}.
|
|
|
|
The most important thing in the description of circuits here, is that those
|
|
circuits are organized as a hierarchy of \emph{circuit groups}. This hierarchy
|
|
can be seen as the construction of a circuit by assembling smaller integrated
|
|
circuits (ICs), themselves built the same way, etc. A group is composed of
|
|
sub-circuits, input pins and output pins. Each level can of course contain
|
|
``leaf'' gates, like \textit{and} or \textit{delay} gates. This is important,
|
|
because it allows the program to work on smaller areas of the circuit (\eg{}
|
|
loading in memory only a part of the circuit, etc.).
|
|
|
|
\emph{Isomatch} comes along with a small parser for a toy ad-hoc language,
|
|
designed to allow one to quickly write a test circuit and run the
|
|
\emph{isomatch} functions on it. There was no real apparent need for a better
|
|
language, or a standard circuit description language (such as
|
|
VHDL~\cite{navabi1997vhdl}), since the user will mostly use \emph{isomatch}
|
|
through fl, and feed it directly with data read from fl --- which is able to
|
|
handle \eg{} VHDL\@.
|
|
|
|
\subsection{Codebases}
|
|
|
|
Carl Seger's new version of fl is currently being developed as \textbf{VossII},
|
|
and is yet to be open-sourced.
|
|
|
|
My contribution to the project, \textbf{isomatch}, is free software, and is
|
|
available on GitHub:
|
|
|
|
\begin{center}
|
|
\raisebox{-0.4\height}{
|
|
\includegraphics[height=2em]{../common/github32.png}}
|
|
\hspace{1em}
|
|
\url{https://github.com/tobast/circuit-isomatch/}
|
|
\end{center}
|
|
|
|
\subsection{Problems}
|
|
|
|
More precisely, the problems that \emph{isomatch} must solve are the following.
|
|
|
|
\begin{enumerate}
|
|
\item\label{prob:equal} Given two circuit groups, are they structurally
|
|
equivalent? That is, are they the same circuit, arranged in a different
|
|
way, with possibly different names, etc.?
|
|
|
|
\item\label{prob:match} Given two circuits, \emph{needle} and
|
|
\emph{haystack}, find every (non-overlapping) occurrence of
|
|
\emph{needle} in \emph{haystack}. An occurrence is a set $S$ of
|
|
sub-circuits of \emph{haystack} such that there is a one-to-one mapping
|
|
of structurally equivalent circuits of $S$ with circuits of
|
|
\emph{needle}, and those circuits are connected the same way in both circuits.
|
|
\end{enumerate}
|
|
|
|
Both problems are hard. The first one is an instance of the graph isomorphism
|
|
problem, as the actual question is whether there exists a one-to-one mapping
|
|
between sub-circuits of the two groups, such that every mapped circuit is equal
|
|
to the other (either directly if it is a leaf gate, or recursively with the
|
|
same procedure); and whether this mapping respects connections (edges) between
|
|
those circuits. Graph isomorphism is known to be in NP (given a permutation of
|
|
the first graph, it is polynomial to check whether the first is equal to the
|
|
second \wrt{} the permutation), but not known to be in either P or NP-complete.
|
|
Thus, since Babai's work on graph isomorphism~\cite{babai2016graph} is only of
|
|
theoretical interest (at the moment), the known algorithms remain in worst-case
|
|
exponential time, and require ad-hoc heuristics for specific kind of graphs to
|
|
get maximum efficiency.
|
|
|
|
The second one is an instance of subgraph isomorphism problem, which is known
|
|
to be NP-complete~\cite{cook1971complexity}. Even though a few algorithms
|
|
(discussed later) are known to be efficient in most cases for this problem, it
|
|
is nevertheless necessary to implement them the right way, and with the right
|
|
heuristics, to get the desired efficiency for the given problem.
|
|
|
|
\subsection{Code quality}
|
|
|
|
Another prominent objective was to keep the codebase as clean as possible.
|
|
Indeed, this code will probably have to be maintained for quite some time, and
|
|
most probably by other people than me. This means that the code and all its
|
|
surroundings must be really clean, readable and reusable. I tried to put a lot
|
|
of effort in making the code idiomatic and easy to use, through \eg{} the
|
|
implementation of iterators over my data structures when needed, idiomatic
|
|
C++14, etc.
|
|
|
|
This also means that the code has to be well-documented: the git history had to
|
|
be kept clean and understandable, and a clean documentation can be generated
|
|
from the code, using \texttt{doxygen}. The latest documentation is also
|
|
compiled as HTML pages here:
|
|
|
|
\begin{center}
|
|
\raisebox{-0.4\height}{
|
|
\includegraphics[height=2.3em]{../common/docs.png}}
|
|
\hspace{1em}
|
|
\url{https://tobast.fr/m1/isomatch}
|
|
\end{center}
|
|
|
|
Since the code is C++, it is also very prone to diverse bugs. While I did not
|
|
took the time to integrate unit tests --- which would have been a great
|
|
addition ---, I used a sequence of test that can be run using \lstc{make
|
|
test}, and tests a lot of features of isomatch.
|
|
|
|
The code is also tested regularly and on a wide variety of cases with
|
|
\lstbash{valgrind} to ensure that there are no memory errors ---
|
|
use-after-free, unallocated memory, memory leaks, bad pointer
|
|
arithmetics,~\ldots In every tested case, strictly no memory is lost, and no
|
|
invalid read was reported.
|
|
|
|
\subsection{Sought efficiency}
|
|
|
|
The goal of \textit{isomatch} is to be applied to large circuits on-the-fly,
|
|
during their conception. Those circuits can (and will probably) be as large as
|
|
a full processor, and the software will be operated by a human, working on
|
|
their circuit. Thus, \textit{isomatch} must be as fast as possible, since
|
|
matching operations will be executed quite often, and often multiple times in a
|
|
row. It must then remain fast enough for the human not to lose too much time,
|
|
and eventually lose patience.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{General approach}
|
|
|
|
\subsection{Theoretical solution}
|
|
|
|
The global strategy used to solve efficiently the problem can be broken down to
|
|
three main parts.
|
|
|
|
\paragraph{Signatures.} The initial idea to make the computation fast is to
|
|
aggregate the inner data of a gate~---~be it a leaf gate or a group~---~in a
|
|
kind of hash, a 64 bits unsigned integer. This approach is directly inspired
|
|
from what was done in fl, back at Intel. This hash must be easy to compute,
|
|
and must be based only on the structure of the graph --- that is, must be
|
|
entirely oblivious of the labels given, the order in which the circuit is
|
|
described, the order in which different circuits are plugged on a wire, \ldots.
|
|
The signature equality, moreover, must be sound; that is, two signatures must
|
|
necessarily be equal if the circuits are indeed equal.
|
|
|
|
This makes it possible to rule out quickly whether two circuits are candidates
|
|
for a match or not, and run the costy actual equality algorithm on fewer gates.
|
|
|
|
|
|
\paragraph{Group equality.} The group equality algorithm is a standard
|
|
backtracking algorithm. It tries to build a match between the graphs by
|
|
trying the diverse permutations of elements with the same signature. It can
|
|
also communicate with the signing part, to request a more precise (but slightly
|
|
slower to compute) signature when it has too many permutations to try.
|
|
|
|
This part could be enhanced, but does not slow down the algorithm on the tested
|
|
examples, so I focused on other parts.
|
|
|
|
|
|
\paragraph{Pattern matching.} This part is the one responsible to answer
|
|
queries for occurrences of a sub-circuit in a circuit. It uses extensively the
|
|
signatures to determine whether two circuits could be a match or not before
|
|
spending too much time actually finding matches, but cannot rely on it as
|
|
heavily as group equality, since only the first level of precision is
|
|
applicable here (detailed later).
|
|
|
|
This part mostly consists in an implementation of Ullmann's algorithm for
|
|
subgraph isomorphism~\cite{ullmann1976algorithm}, a well-known algorithm for
|
|
this problem, that uses the specificities of the graph to be a little faster.
|
|
|
|
\subsection{Workflow}
|
|
|
|
In a first time, to get the algorithms, heuristics and methods right, I
|
|
designed a prototype in OCaml. This prototype was not intended to --- and
|
|
neither was --- optimized, but allowed me to find places where the program took
|
|
an unacceptable amount of time. For instance, I left the prototype that I
|
|
thought fast enough to compute a group equality a whole night long, before
|
|
finding out in the morning it was actually not fast enough at all. This first
|
|
version was also written in a strongly typed language, with a lot of static
|
|
guarantees. It gave me enough confidence in my methods and strategies to move
|
|
on to a way faster and optimized C++ version, the current version of
|
|
\emph{isomatch}.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Signatures}\label{sec:signatures}
|
|
|
|
The signature is computed as a simple hash of the element, and is defined for
|
|
every type of expression and circuit. It could probably be enhanced with a bit
|
|
more work to cover more uniformly the hash space, but no illegitimate collision
|
|
(that is, a collision that could be avoided with a better hash function, as
|
|
opposed to collisions due to an equal local graph structure) was observed on
|
|
the examples tested.
|
|
|
|
\paragraph{Signature constants.} Signature constants are used all around the
|
|
signing process, and is a 5-tuple $\sigconst{} = (a, x_l, x_h, d_l, d_h)$ of 32
|
|
bits unsigned numbers. All of $x_l$, $x_h$, $d_l$ and $d_h$ are picked as prime
|
|
numbers between $10^8$ and $10^9$ (which just fits in a 32 bits unsigned
|
|
integer); while $a$ is a random integer uniformly picked between $2^{16}$ and
|
|
$2^{32}$. These constants are generated by a small python script,
|
|
\path{util/primegen/pickPrimes.py} in the repository.
|
|
|
|
Those constants are used to produce a 64 bits unsigned value out of another 64
|
|
bits unsigned value, called $v$ thereafter, through an operator $\sigop$,
|
|
computed as follows (with all computations done on 64 bits unsigned integers).
|
|
|
|
\vspace{1em}
|
|
\begin{center}
|
|
\begin{algorithmic}
|
|
\Function{$\sigop$}{$\sigconst{}, v$}
|
|
\State{} $out1 \gets (v + a) \cdot x_l$
|
|
\State{} $v_h \gets (v \lsr 32) \xor (out1 \lsr 32)$
|
|
\State{} $low \gets out1 \,\%\, d_l$
|
|
\State{} $high \gets \left((v_h + a) \cdot x_h \right) \%\, d_h$
|
|
\State{} \Return{} $low + 2^{32} \cdot high$
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\end{center}
|
|
|
|
\paragraph{Expressions.} Each type of expression (or, in the case of
|
|
expression with operator, each type of operator) has its signature constant,
|
|
$\sigconst{\text{exprType}}$. The signature of a commutative expression in its
|
|
operands is always commutative, and the signature of a non-commutative
|
|
expression should not be (and is not, except for collisions). The value $v$
|
|
used to sign the expression (in $\sigop(\sigconst{\text{exprType}}, v)$) is
|
|
then the sum (respectively difference) of the signature of its parameters for
|
|
commutative (respectively non-commutative) expressions.
|
|
|
|
\paragraph{Circuits' inner signature.} Every circuit is associated with a value
|
|
describing its \emph{type} (rather than its contents): 8 bits of circuit type
|
|
ID (delay, tristate, \ldots), the number of inputs on the next 8 bits, and the
|
|
number of outputs on 8 more bits. This value is then xored with the inner value
|
|
of the circuit: for a combinatorial gate, the xor of its expressions'
|
|
signatures; for a group, the sum of its children's signatures\footnote{As a
|
|
group is likely to have multiple occurrences of a single identical circuit, it
|
|
would be unwise to xor its children's signatures, even though the usual advice
|
|
is to combine hashes by xoring them.}, \ldots This value constitutes the
|
|
circuit's \emph{inner signature}.
|
|
|
|
\paragraph{Circuits' signature of order $n$.} The inner signature does not
|
|
capture at all the \emph{structure} of the graph. An information we can capture
|
|
without breaking the signature's independence towards the order of description
|
|
of the graph, is the set of its neighbours. Yet, we cannot ``label'' the gates
|
|
without breaking this rule; thus, we represent the set of neighbours by the set
|
|
of the \emph{neighbours' signatures}.
|
|
|
|
At this point, we can define the \emph{signature of order $n$} ($n \in
|
|
\natset$) of a circuit $C$ as follows:
|
|
|
|
\begin{align*}
|
|
\sig_0(C) :&= \text{inner signature of } C\\
|
|
\sig_{n+1}(C) :&= \text{inner signature of }C
|
|
+ \text{IO adjacency}
|
|
+ \hspace{-2em}\sum\limits_{C_i \in \,\text{neighbours of inputs}}
|
|
\hspace{-2em}\sig_n(C_i) \hspace{1em}
|
|
- \hspace{-2em}\sum\limits_{C_o \in \,\text{neighbours of outputs}}
|
|
\hspace{-2em}\sig_n(C_o)
|
|
\end{align*}
|
|
|
|
The ``IO adjacency'' term is an additional term in the signatures of order
|
|
above $0$, indicating what input and output pins of the circuit group
|
|
containing the current gate are adjacent to it. Adding this information to the
|
|
signature was necessary, since a lot of gates can be signed differently using
|
|
this information (see Corner cases in Section~\ref{ssec:corner_cases}).
|
|
|
|
The default order of signature used in all computations, unless more is useful,
|
|
is 2, after a few benchmarks.
|
|
|
|
\paragraph{Efficiency.} Every circuit memoizes all it can concerning its
|
|
signature: the inner signature, the IO adjacency, the signatures of order $n$
|
|
already computed, etc.
|
|
|
|
This memoization, alongside with the exclusive use of elementary operations,
|
|
makes the computation of a signature very fast. The computation is linear in
|
|
the number of gates in a circuit, times the order computed; the computation is
|
|
lazy.
|
|
|
|
To keep those memoized values up to date whenever the structure of the circuit
|
|
is changed (since this is meant to be integrated in a programming language, fl,
|
|
a standard workflow will possibly be create a circuit, check its signature,
|
|
alter it, then check again), each circuit keeps track of a ``timestamp'' of
|
|
last modification, which is incremented whenever the circuit or its children
|
|
are modified. A memoized data is always stored alongside with a timestamp of
|
|
computation, which invalidates a previous result when needed.
|
|
|
|
One possible path of investigation for future work, if the computation turns
|
|
out to be still too slow in real-world cases --- which looks unlikely, unless
|
|
fl's substitution is run on a regular basis for a huge number of cases using
|
|
\eg{} a crontab for automated testing ---, would be to try to multithread this
|
|
computation.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Group equality}\label{sec:group_equality}
|
|
|
|
Given two circuit group gates, the task of group equality is to determine
|
|
whether the two groups are structurally equivalent, as discussed above.
|
|
|
|
Group equality itself is handled as a simple backtracking algorithm, trying to
|
|
establish a match (an isomorphism, that is, a permutation of the gates of one
|
|
of the groups) between the two groups given.
|
|
|
|
The gates of the two groups are matched by equal number of inputs and outputs
|
|
and equal signatures --- based on the signature of default order (that is, 2).
|
|
A few checks are made, \eg{} every matching group must have the same size on
|
|
both sides (if not, then, necessarily, the two groups won't match). Then, the
|
|
worst case of number of permutations to check is evaluated.
|
|
|
|
If this number is too high, the signature order will be incremented, and the
|
|
matching groups re-created accordingly, until a satisfyingly low number of
|
|
permutations is reached (or the diameter of the circuit is reached, meaning
|
|
that increasing the order of signature won't have any additional impact). This
|
|
order increase ``on-demand'' proved itself very efficient, effectively lowering
|
|
the number of permutations examined to no more than $4$ in studied cases.
|
|
|
|
Once a permutation is judged worth to be examined, the group equality is run
|
|
recursively on all its matched gates. If this step succeeds, the graph
|
|
structure is then checked. If both steps succeed, the permutation is correct
|
|
and an isomorphism has been found; if not, we move on to the next permutation.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Pattern-match}
|
|
|
|
We finally need to be able to find every occurrence of a given \emph{needle}
|
|
circuit in a bigger \emph{haystack} circuit — at any level in its groups
|
|
hierarchy. This problem is basically graph isomorphism with some specificities,
|
|
and for this purpose I used a classical algorithm for the subgraph isomorphism
|
|
problem, \emph{Ullmann}.
|
|
|
|
\subsection{Ullmann's algorithm}
|
|
|
|
One of the classical algorithms to deal with the subgraph isomorphism problem
|
|
was first described by Julian R Ullmann in 1976~\cite{ullmann1976algorithm}.
|
|
Another, more recent algorithm to deal with this problem is Luigi P Cordella's
|
|
VF2 algorithm~\cite{cordella2004sub}, published in 2004. This algorithm is
|
|
mostly Ullmann's algorithm, transcribed in a recursive writing, with the
|
|
addition of five heuristics. I originally planned to implement both algorithms
|
|
and benchmark both, but had no time to do so in the end; though, Ullmann with
|
|
the few additional heuristics applicable in our very specific case turned out
|
|
to be fast enough.
|
|
|
|
Ullmann is a widely used and fast algorithm for this problem. It makes an
|
|
extensive use of adjacency matrix description of the graph, and the initial
|
|
article takes advantage of the representation of those matrices as bitsets to
|
|
make extensive use of bitwise operations.
|
|
|
|
The to-be-built permutation matrix is a $\card{needle} \times \card{haystack}$
|
|
matrix. Each $1$ in a cell $(i, j)$ indicates that the $i$-th needle part is a
|
|
possible match with the $j$-th haystack part. This matrix is called $perm$
|
|
thereafter.
|
|
|
|
The algorithm, left apart the \textsc{refine} function, which is detailed just
|
|
after and can be omitted for a (way) slower version of the algorithm, is
|
|
described in Figure~\ref{alg:ullmann}.
|
|
|
|
\begin{figure}[h]
|
|
\begin{algorithmic}
|
|
\Function{find\_at\_depth}{depth, perm, freeVert}
|
|
\If{no 1s on \lstc{perm[depth]}}
|
|
\State{} \Return{}
|
|
\EndIf{}
|
|
|
|
\State{} Save perm
|
|
|
|
\For{$0 \leq$ chosen $< \card{\text{haystack}}$ such that
|
|
\lstc{perm[depth][chosen]} $ = 1$ and \lstc{freeVert[chosen]}}
|
|
\State{} Put $0$s everywhere on \lstc{perm[depth]}, but on
|
|
\lstc{chosen}
|
|
|
|
\State{} Refine perm
|
|
\If{a row of perm has only $0$s}
|
|
\State{} \Return{}
|
|
\EndIf{}
|
|
|
|
\If{depth $=$ $\card{\text{needle}} - 1$}
|
|
\State{} Store perm as a result
|
|
\Else{}
|
|
\State{} \Call{find\_at\_depth}{depth$+1$, perm, freeVert with
|
|
freeVert[chosen] $= 0$}
|
|
\EndIf{}
|
|
|
|
\State{} Restore perm
|
|
\EndFor{}
|
|
\EndFunction{}
|
|
|
|
\vspace{1em}
|
|
|
|
\Function{find}{perm}
|
|
\State{} \Return{} \Call{find\_at\_depth}{0, perm, [$1, \ldots, 1$]}
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\caption{Ullmann's algorithm (without refining)}\label{alg:ullmann}
|
|
\end{figure}
|
|
|
|
The refining process is the actual keystone of the algorithm. It is the
|
|
mechanism allowing the algorithm to cut down many exploration branches, by
|
|
changing ones to zeroes in the matrix being built.
|
|
|
|
The idea is that a match between a needle's vertex $i$ and a haystack's vertex
|
|
$j$ is only possible if, for each neighbour $k$ of $i$, $j$ has a neighbour
|
|
$k'$ such that the permutation matrix has a one in position $(k, k')$. In other
|
|
words, a match between $i$ and $j$ is only possible if every neighbour $k$ of
|
|
$i$ (in needle) has a possibly matching (\wrt{} $perm$) vertex $k'$ (in
|
|
haystack) which is a neighbour of $j$.
|
|
|
|
For instance, for the needle from Figure~\ref{fig:refine_needle}'s, we can try
|
|
to check the ones corresponding to the vertex $a$ (the coloured one). Refining
|
|
it while matching it with Figure~\ref{fig:refine_haystack_ok} will leave a $1$
|
|
on the match $a-a'$, since every neighbour of $a$ can be matched with a
|
|
neighbour of $a'$ (which, hopefully, are a $1$ in the matrix): $b$ matches $y$
|
|
and $c$ matches $z$, for instance. It is not the case with the haystack from
|
|
Figure~\ref{fig:refine_haystack_bad}: if the process went far enough already,
|
|
there should be no corresponding vertex for either $b$ or $c$, since there are
|
|
no such two vertices with an edge linking them. If there is indeed no match at
|
|
this point for either $w$ or $c$, the $1$ in the cell matching $a - a'$ will be
|
|
turned to a $0$.
|
|
|
|
This condition is checked on every $1$ in the permutation matrix. If it is not
|
|
met, the cell is nulled. This, though, potentially creates new ones not
|
|
matching the condition: the process must be run again, until no new zeroes
|
|
appear.
|
|
|
|
In the initial article~\cite{ullmann1976algorithm}, Ullmann advocates for
|
|
bitwise tricks to complete this expensive step: indeed, checking the existence
|
|
of such a $k'$ can be done by checking the nullity of the bitwise \textsc{and}
|
|
of the adjacency of $j$ and the permutation matrix row of $k$.
|
|
|
|
The refining function is detailed in Figure~\ref{alg:ullmann_refine}.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{minipage}{0.3\textwidth}
|
|
\centering
|
|
\includegraphics[width=0.5\textwidth]{img/refine_needle.png}
|
|
\caption{Needle}\label{fig:refine_needle}
|
|
\end{minipage}
|
|
\begin{minipage}{0.3\textwidth}
|
|
\centering
|
|
\includegraphics[width=0.5\textwidth]{img/refine_haystack_ok.png}
|
|
\caption{Matching haystack}\label{fig:refine_haystack_ok}
|
|
\end{minipage}
|
|
\begin{minipage}{0.3\textwidth}
|
|
\centering
|
|
\includegraphics[width=0.5\textwidth]{img/refine_haystack_bad.png}
|
|
\caption{Non-matching haystack}\label{fig:refine_haystack_bad}
|
|
\end{minipage}
|
|
\caption{Example: Ullmann refining process}
|
|
\end{figure}
|
|
|
|
\begin{figure}[h]
|
|
\begin{algorithmic}
|
|
\Function{refine}{perm}
|
|
\While{changes during last run}
|
|
\For{each needle vertex $i$}
|
|
\For{each haystack vertex $j$}
|
|
\If{\lstc{perm[i][j]} $=$ 1}
|
|
\For{each neighbour $k$ of $i$ in needle}
|
|
\If{\lstc{perm[k] & haystack\_adjacency[j]}
|
|
$= 0$}
|
|
\State{} \lstc{perm[i][j]} $\gets 0$
|
|
\EndIf{}
|
|
\EndFor{}
|
|
\EndIf{}
|
|
\EndFor{}
|
|
\EndFor{}
|
|
\EndWhile{}
|
|
\EndFunction{}
|
|
\end{algorithmic}
|
|
\caption{Ullmann's refining function}\label{alg:ullmann_refine}
|
|
\end{figure}
|
|
|
|
\subsection{Ullmann for \emph{isomatch}}
|
|
|
|
\paragraph{Graph used.} Our circuit is not actually a graph just as-is: indeed,
|
|
a wire can be connected to multiple circuits (multiple gates' inputs, or even
|
|
multiple gates' outputs when using tristate circuits). This could be transposed
|
|
into a graph with $\frac{n(n-1)}{2}$ edges (the complete subgraph) for this
|
|
particular wire. Though, internally, a wire is better represented as a vertex
|
|
itself, with $n$ edges linking it to the connected gates. This representation
|
|
is also used in Ullmann's implementation, leading to a permutation matrix of
|
|
$\left(\card{\text{needle gates}} + \card{\text{needle wires}}\right) \times
|
|
\left(\card{\text{haystack gates}} + \card{\text{haystack wires}}\right)$.
|
|
|
|
\paragraph{Final result.} Once a result (\ie{} a correct permutation) is
|
|
obtained, we further need to check it is actually a solution of our problem.
|
|
Indeed, while the structure is guaranteed by the algorithm to be the same, we
|
|
still need to check that every circuit is equal to its matched one, through the
|
|
procedure described in Section~\ref{sec:group_equality}. So far, only the
|
|
equality of signatures was checked. We only need to check the circuits, as the
|
|
wires are necessarily actually matching.
|
|
|
|
\paragraph{Non-overlapping results.} We want our results to be non-overlapping
|
|
(because we won't be able to perform a search-and-replace if it is not the
|
|
case). Whenever two potential results are conflicting, an arbitrary one of the
|
|
two can be returned (a human user is operating the software and can make a
|
|
narrower search if needed).
|
|
|
|
To match this specification, we must keep track of the circuits that are
|
|
already included in a match. We also cannot include an ancestor of a circuit
|
|
that was included in a match in another match (though this is not possible,
|
|
because the needle can't be included in itself, but signature collisions could
|
|
occur).
|
|
|
|
\subsection{Implementation optimisations}
|
|
|
|
\paragraph{Pre-check.} The needle will, in most cases, not be found at all in
|
|
a given hierarchy group of the haystack. To avoid wasting computation time, we
|
|
first check that every signature present in the needle is present at least as
|
|
many times in the haystack. This simple check saved a lot of time.
|
|
|
|
\paragraph{Initial permutation matrix.} The matrix is first filled according to
|
|
the signatures' matches. Note that only signatures of order 0 --- \ie{} the
|
|
inner data of a vertex --- can be used here: indeed, we cannot rely on the
|
|
context here, since there can be some context in the haystack that is absent
|
|
from the needle, and we cannot check for ``context inclusion'' with our
|
|
definition of signatures: \emph{all} the context must be exactly the same for
|
|
two signatures to match. It is then refined a bit more, by making sure that for
|
|
every match, every potentially matching gate has the same ``wire kinds''.
|
|
Indeed, a gate needle's wire must have at least the same inbound adjacent
|
|
signatures as its matching haystack wire, and same goes for outbound adjacent
|
|
signatures. Thus, two circuits cannot be matched if this condition is not
|
|
respected for each pair of corresponding wires of those circuits, and their
|
|
corresponding cell in the permutation matrix can be nulled.
|
|
|
|
\paragraph{Conversion to adjacency matrix.} The internal structures and graphs
|
|
are represented as inherited classes of \lstcpp{CircuitTree}, connected to
|
|
various \lstcpp{WireId}s. Thus, there is no adjacency matrix easily available,
|
|
and the various vertices have no predefined IDs. Thus, we need to assign IDs to
|
|
every vertex, \ie{} every gate and wire.
|
|
|
|
\paragraph{Order of rows and columns} In his original paper, Ullmann recommends
|
|
to index the vertices in order of decreasing degree (\ie, the vertices with
|
|
highest degree are topmost/leftmost in the matrix). This amplifies the effect
|
|
of the refinement procedure, since vertices with higher degree are connected to
|
|
more vertices, and thus has a higher chance of having a neighbour with no
|
|
matching neighbour in the haystack. This allows the algorithm to cut whole
|
|
branches of the search tree very early, and speeds up the algorithm a lot. Yet,
|
|
Ullmann also states that on specific cases, a different order might be more
|
|
appropriate.
|
|
|
|
My idea was that it might be interesting to put first all the wires, and then
|
|
all the circuits, or the other way around. For that, I did a few benchmarks.
|
|
The measures were made on a system with a nearly-constant load during the
|
|
experiments. The machine had a i7-6700 processor (3.6GHz reported % chktex 8
|
|
frequency). All the measures are averaged over 100 runs, and were measured on
|
|
the usual test set.
|
|
|
|
\vspace{1em}
|
|
\begin{tabular}{l r r}
|
|
\textbf{Ordering method} & \textbf{Run time (ms)} & \textbf{Loss (\%)} \\
|
|
Wires by degree decreasing, then gates as they come & 48.8 & --- \\
|
|
As they come, gates then wires & 49.1 & 0.6\% \\
|
|
By degree decreasing, wires then gates & 49.3 & 1.0\% \\
|
|
As they come, wires then gates & 49.3 & 1.0\% \\
|
|
Gates as they come, then wires by degree decreasing & 49.5 & 1.4\% \\
|
|
By degree decreasing, all mixed & 49.5 & 1.4\% \\
|
|
\end{tabular}
|
|
\vspace{1em}
|
|
|
|
The time loss is nearly insignificant, and can be explained by constant costs:
|
|
when we want to sort vertices, the vector must be copied then sorted, which can
|
|
be more expensive than just taking its elements as they come, if the gain is
|
|
not high enough.
|
|
|
|
Nevertheless, I chose to implement the fastest option with respect to this
|
|
benchmark. If the gap is nearly insignificant, the choice can't really be
|
|
drastically harmful in the end.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Performance}
|
|
|
|
In this section, all the measures were made on a computer with an Intel
|
|
i7-3770 CPU (3.40GHz) and 8\,GB of RAM\@. % chktex 8
|
|
|
|
\subsection{Small processor}
|
|
|
|
The example I used widely during my project to test my program and check its
|
|
efficiency periodically was a small processor designed one year earlier as a
|
|
school project~\cite{sysdig_cpu}. The processor implements a large subset of
|
|
ARM\@. It does not feature a multiplier, nor a divider circuit, but supports
|
|
instructions with conditions (ARM-flavoured: a whole lot of conditional
|
|
prefixes can be plugged into an assembly instruction). It was conceived as a
|
|
few recursively hierarchized modules, but was flattened as a plain netlist when
|
|
generated from its python code, so I first had to patch its generator code to
|
|
make its hierarchy apparent.
|
|
|
|
The circuit is composed, at its root level, of a few modules: a memory unit
|
|
(access to RAM), a flags unit (handling the ALU's flags), two operands units
|
|
(decoding the operands and applying a barrel shifter if needed), an opcode
|
|
decoding unit (decoding the program's opcodes), a registers unit (containing
|
|
the registers) and a result selector (selecting the output between the ALU's
|
|
and the registers' output, based on the opcode).
|
|
|
|
The processor, in the end, has around 2000 leaf gates (but works at word level)
|
|
and 240 hierarchy groups.
|
|
|
|
\paragraph{Signature.} First, the time required to sign the whole circuit with
|
|
different levels of signature (\ie{} the order of signature computed for every
|
|
part of the circuit). In practice, we never compute high order signatures for a
|
|
whole circuit, as signature of subgroups are always computed by default at the
|
|
order $2$, unless this particular group needs a more accurate signature.
|
|
|
|
The measures were made for 100 consecutive runs of the program (then averaged
|
|
for a single run) and measured by the command \texttt{time}. The computing time
|
|
necessary for different signature levels is plotted in
|
|
Figure~\ref{fig:bench_sig_level}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{tikzpicture}
|
|
\begin{axis}[
|
|
xlabel={Level of signature},
|
|
ylabel={Time (ms)},
|
|
xmin=0, xmax=16,
|
|
ymin=0, ymax=300,
|
|
legend pos=north west,
|
|
ymajorgrids=true,
|
|
grid style=dashed,
|
|
]
|
|
|
|
\addplot[
|
|
color=blue,
|
|
mark=square,
|
|
]
|
|
coordinates {
|
|
(2,105.4)
|
|
(3,122.6)
|
|
(4,140.1)
|
|
(5,155.4)
|
|
(6,171.2)
|
|
(7,183.9)
|
|
(8,198.3)
|
|
(9,211.2)
|
|
(10,224.3)
|
|
(11,236.7)
|
|
(12,248.5)
|
|
(13,259.3)
|
|
(14,271.7)
|
|
(15,281.4)
|
|
};
|
|
\legend{-O3}
|
|
\end{axis}
|
|
\end{tikzpicture}
|
|
\caption{Signature time of the processor for different levels of
|
|
signature}\label{fig:bench_sig_level}
|
|
\end{figure}
|
|
|
|
The computation time is more or less linear in in the level of signature
|
|
required, which is coherent with the implementation. In practice, only small
|
|
portions of a circuit will be signed with a high order, which means that we can
|
|
afford really high order signatures (\eg{} 40 or 50, which already means that
|
|
the diameter of the group is 40 or 50) without having a real impact on the
|
|
computation time.
|
|
|
|
This linearity means that we can increase the signature order without too much
|
|
impact, as we do when computing a group equality.
|
|
|
|
|
|
\paragraph{Equality.} To test the circuit group equality, a small piece of
|
|
code takes a circuit, scrambles it as much as possible
|
|
--- without altering its structure ---, \eg{} by renaming randomly its parts,
|
|
by randomly changing the order of the circuits and groups, \ldots{} The circuit
|
|
is then matched with its unaltered counterpart.
|
|
|
|
For the processor described above, it takes about \textbf{313\,ms} to
|
|
prove it equal to its scrambled version, and then the other way around. Yet,
|
|
the memoized results (essentially the signatures) are kept for the second one,
|
|
considerably speeding it up: the same program proving only one way takes about
|
|
\textbf{310\,ms}.
|
|
|
|
Some signatures of order higher than two need to be computed, and there is a
|
|
constant time needed to scramble the circuit, etc., which means that the actual
|
|
equality match time is ridiculously small compared to the signature computation
|
|
time.
|
|
|
|
\paragraph{Match.} The subcircuit match feature was tested by trying to find
|
|
every occurrence of a pattern that can be easily found using tools like
|
|
\lstbash{grep}. For this purpose, the ad-hoc implementation of a MUX gate was
|
|
used: two tristate gates and a NOT gate, as in Figure~\ref{fig:mux_of_tri}.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[height=6cm]{img/mux_of_tri.png}
|
|
\caption{MUX gate made out of tristate and not gates}\label{fig:mux_of_tri}
|
|
\end{figure}
|
|
|
|
This group appears 73 times in the processor. To match them all (none are
|
|
overlapping), it takes \textbf{113\,ms}.
|
|
|
|
\subsection{Corner cases}\label{ssec:corner_cases}
|
|
|
|
There were a few observed cases where the algorithm tends to be slower on
|
|
certain configurations, and a few other such cases that could be fixed.
|
|
|
|
\paragraph{I/O pins.} In Section~\ref{sec:signatures}, we introduce a term
|
|
named \emph{IO adjacency} in the signatures of order higher than $0$. This is
|
|
because some sub-circuits can be told apart from their signatures only through
|
|
this information, and the absence of this term slowed down considerably the
|
|
program before its introduction. This is the case for instance in circuits like
|
|
the one in Figure~\ref{fig:io_adj_term}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.6\textwidth]{img/io_adj_term.png}
|
|
\caption{A case where the I/O adjacency term is necessary}\label{fig:io_adj_term}
|
|
\end{figure}
|
|
|
|
Here, if the adjacent input and output pins of Group are not taken into account
|
|
in the signatures of the gates (A, B, C, D, E), there is apparently no way to
|
|
differentiate A from B and C from D. The example can of course be pushed
|
|
further to way more dramatic cases, for instance with a (binary) tree of
|
|
circuits in the same shape as above.
|
|
|
|
\paragraph{Split/merge trees.} A common pattern that tends to slow down the
|
|
algorithm is split/merge trees. Those patterns occur when one wants to merge
|
|
$n$ one bit wires into a single $n$ bits wire, or the other way around.
|
|
|
|
These patterns are pretty common, for instance when an opcode is run through a
|
|
MUX tree to perform the requested operation.
|
|
|
|
Though, this pattern generates a lot of collisions in signatures. Indeed, for a
|
|
tree of depth \eg{} 8, a node just below the root will need a signature of
|
|
order 7 to have a different signature than another one at the same depth. With
|
|
a signature of order up to 6, only other gates from the tree will be included
|
|
in the signature when going down in the tree; the exact same gates will be
|
|
included above the tree's root. Thus, nothing will differentiate one gate from
|
|
another while the boundary of the tree is not reached (assuming the gates below
|
|
the tree's leaves are not all the same; if so, more levels will be needed).
|
|
|
|
As the notion of ``left child'' and ``right child'' cannot be used (since it
|
|
would rely on the order or description of the graph), there seems to be no good
|
|
way to discriminate those two nodes. Furthermore, the nodes are not totally
|
|
interchangeable: indeed, when checking for an equality between two such trees,
|
|
it does not matter which node is the left one; but once this is fixed, the
|
|
nodes on the layer below cannot be freely exchanged.
|
|
|
|
For instance, in Figure~\ref{fig:split_tree}, the orange borders are the
|
|
boundaries of what can be taken into account for the signatures of order $1$ of
|
|
the gates marked with a red dot. Thus, those signatures are exactly the same.
|
|
|
|
\begin{figure}[hb!]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{img/tree_local.png}
|
|
\caption{Case of a split (or merge) tree}\label{fig:split_tree}
|
|
\end{figure}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section*{Conclusion}
|
|
|
|
At this point, \textit{isomatch} seems to be fast enough to be plugged into
|
|
VossII, and is being integrated at the moment. On all the cases tested --- with
|
|
tests that tried to be as complete as possible by testing each independent
|
|
feature --- it returned a correct result. Even though there are a handful of
|
|
ways to enhance it, make it faster, etc., it is useable in its current state.
|
|
|
|
This internship led me to develop new strategies to bypass corner cases where
|
|
the heuristics were inefficient, the methods inadequate, \ldots{} But this
|
|
project also made me practice again with C++, which I had left behind for some
|
|
time; and forced me to try to have a code as clean as possible, challenging me
|
|
on small details that were easy to implement, but hard to implement in an
|
|
understandable and bug-proof way.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\printbibliography{}
|
|
|
|
\end{document}
|