More progress

intro: cite techniques
talk about ml 1st version
benchs on order of indexation in Ullmann
This commit is contained in:
Théophile Bastian 2017-08-28 18:49:08 +02:00
parent ca21ecf0ff
commit 9f3120f0aa
2 changed files with 85 additions and 9 deletions

View file

@ -61,7 +61,7 @@
} }
@inproceedings{seger2006integrating, @inproceedings{seger2006integrating,
title={Integrating design and verification-from simple idea to practical system}, title={Integrating design and verification-from simple idea to practical system (abstract only)},
author={Seger, Carl}, author={Seger, Carl},
booktitle={Formal Methods and Models for Co-Design, 2006. MEMOCODE'06. Proceedings. Fourth ACM and IEEE International Conference on}, booktitle={Formal Methods and Models for Co-Design, 2006. MEMOCODE'06. Proceedings. Fourth ACM and IEEE International Conference on},
pages={161--162}, pages={161--162},
@ -70,7 +70,7 @@
} }
@inproceedings{seger2006design, @inproceedings{seger2006design,
title={The design of a floating point execution unit using the Integrated Design and Verification (IDV) system}, title={The design of a floating point execution unit using the Integrated Design and Verification (IDV) system (abstract only)},
author={Seger, Carl}, author={Seger, Carl},
booktitle={Int. Workshop on Designing Correct Circuits}, booktitle={Int. Workshop on Designing Correct Circuits},
year={2006} year={2006}

View file

@ -84,17 +84,35 @@ reliability.
To avoid such disasters, the industry nowadays uses a wide range of techniques To avoid such disasters, the industry nowadays uses a wide range of techniques
to catch bugs as early as possible --- which, hopefully, is before the to catch bugs as early as possible --- which, hopefully, is before the
product's release date. Among those are \todo{list + cite}, but also proved product's release date. These techniques include of course a lot of testing on
hardware. On circuits as complex as processors, usually, only sub-components simulated hardware or FPGAs (since an actual processor is extremely expensive
are proved correct with respect to a given specification of its behaviour, to actually print). A lot of testing is run as a routine on the current version
of the hardware, to catch and notify the designers, since it remains the
easiest way to test the behaviour of a circuit. Symbolic trajectory evaluation
has also its place in the domain, allowing one to run a circuit on a few cycles
(before it becomes too expensive) with symbolic values, \ie{} variables instead
of zeroes, ones and $X$s (for ``not a value''). This kind of testing is way
more powerful than plain testing, since its results are more precise; yet it is
also too expensive to run on a significantly long number of cycles, and
therefore a lot of bugs are impossible to catch this way.
The previous methods are a good cheap method to test a circuit, but give only
little confidence in its correction --- it only proves that among all the cases
that were tested, all yielded a correct behaviour. These reasons led to the
development of proved hardware in the industry. On circuits as complex as
processors, usually, only sub-components are proved correct with respect to a
given specification of its behaviour (usually source code that should behave as
the processor is expected to behave, itself with respect to the written
documentation draft of the circuit). These proofs are typically valid only
while the circuit is kept in a specified context, \ie{} a set of valid inputs while the circuit is kept in a specified context, \ie{} a set of valid inputs
and outputs, etc. --- that should, but is not proved to, be respected by the and outputs, etc. --- that should, but is not proved to, be respected by the
other parts of the circuit. Yet, this trade-off between proved correctness and other parts of the circuit. Yet, this trade-off between proved correctness and
engineer's work time already gives a pretty good confidence in the circuit. engineer's work time already gives a pretty good confidence in the circuit.
In this context, Carl Seger was one of the main developers of fl at Intel, a In this context, Carl Seger was one of the main developers of fl at
functional ml-inspired programming language integrating many features useful to Intel~\cite{seger1993vos}~\cite{seger2005industrially}~\cite{seger2006design},
get insights of a circuit, testing it and proving it. It mostly features a functional ml-inspired programming language integrating many features useful
to get insights of a circuit, testing it and proving it. It mostly features
symbolic trajectory evaluation based model checking and theorem proving, and is symbolic trajectory evaluation based model checking and theorem proving, and is
intended to be an all-purpose toolbox for the hardware prover. intended to be an all-purpose toolbox for the hardware prover.
@ -291,6 +309,8 @@ and eventually lose patience.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{General approach} \section{General approach}
\subsection{Theoretical solution}
The global strategy used to solve efficiently the problem can be broken down to The global strategy used to solve efficiently the problem can be broken down to
three main parts. three main parts.
@ -329,6 +349,19 @@ This part mostly consists in an implementation of Ullmann's algorithm for
subgraph isomorphism~\cite{ullmann1976algorithm}, a well-known algorithm for subgraph isomorphism~\cite{ullmann1976algorithm}, a well-known algorithm for
this problem, that uses the specificities of the graph to be a little faster. this problem, that uses the specificities of the graph to be a little faster.
\subsection{Workflow}
In a first time, to get the algorithms, heuristics and methods right, I
designed a prototype in OCaml. This prototype was not intended to --- and
neither was --- optimized, but allowed me to find places where the program took
an unacceptable amount of time. For instance, I left the prototype that I
thought fast enough to compute a group equality a whole night long, before
finding out in the morning it was actually not fast enough at all. This first
version was also written in a strongly typed language, with a lot of static
guarantees. It gave me enough confidence in my methods and strategies to move
on to a way faster and optimized C++ version, the current version of
\emph{isomatch}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Signatures}\label{sec:signatures} \section{Signatures}\label{sec:signatures}
@ -670,7 +703,50 @@ a given hierarchy group of the haystack. To avoid wasting computation time, we
first check that every signature present in the needle is present at least as first check that every signature present in the needle is present at least as
many times in the haystack. This simple check saved a lot of time. many times in the haystack. This simple check saved a lot of time.
\todo{More stuff?} \paragraph{Conversion to adjacency matrix.} The internal structures and graphs
are represented as inherited classes of \lstcpp{CircuitTree}, connected to
various \lstcpp{WireId}s. Thus, there is no adjacency matrix easily available,
and the various vertices have no predefined IDs. Thus, we need to assign IDs to
every vertex, \ie{} every gate and wire.
\paragraph{Order of rows and columns} In his original paper, Ullmann recommends
to index the vertices in order of decreasing degree (\ie, the vertices with
highest degree are topmost/leftmost in the matrix). This amplifies the effect
of the refinement procedure, since vertices with higher degree are connected to
more vertices, and thus has a higher chance of having a neighbour with no
matching neighbour in the haystack. This allows the algorithm to cut whole
branches of the search tree very early, and speeds up the algorithm a lot. Yet,
Ullmann also states that on specific cases, a different order might be more
appropriate.
My idea was that it might be interesting to put first all the wires, and then
all the circuits, or the other way around. For that, I did a few benchmarks.
The measures were made on a system with a nearly-constant load during the
experiments. The machine had a i7-6700 processor (3.6GHz reported % chktex 8
frequency). All the measures are averaged over 100 runs, and were measured on
the usual test set.
\vspace{1em}
\begin{tabular}{l r r}
\textbf{Ordering method} & \textbf{Run time (ms)} & \textbf{Loss (\%)} \\
Wires by degree decreasing, then gates as they come & 48.8 & --- \\
As they come, gates then wires & 49.1 & 0.6\% \\
Wires by degree decreasing, then gates by degree decreasing & 49.3
& 1.0\% \\
As they come, wires then gates & 49.3 & 1.0\% \\
Gates as they come, then wires by degree decreasing & 49.5 & 1.4\% \\
By degree decreasing, all mixed & 49.5 & 1.4\% \\
\end{tabular}
\vspace{1em}
The time loss is nearly insignificant, and can be explained by constant costs:
when we want to sort vertices, the vector must be copied then sorted, which can
be more expensive than just taking its elements as they come, if the gain is
not high enough.
Nevertheless, I chose to implement the fastest option with respect to this
benchmark. If the gap is nearly insignificant, the choice can't really be
drastically harmful in the end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Performance} \section{Performance}