More progress
intro: cite techniques talk about ml 1st version benchs on order of indexation in Ullmann
This commit is contained in:
parent
ca21ecf0ff
commit
9f3120f0aa
2 changed files with 85 additions and 9 deletions
|
@ -61,7 +61,7 @@
|
|||
}
|
||||
|
||||
@inproceedings{seger2006integrating,
|
||||
title={Integrating design and verification-from simple idea to practical system},
|
||||
title={Integrating design and verification-from simple idea to practical system (abstract only)},
|
||||
author={Seger, Carl},
|
||||
booktitle={Formal Methods and Models for Co-Design, 2006. MEMOCODE'06. Proceedings. Fourth ACM and IEEE International Conference on},
|
||||
pages={161--162},
|
||||
|
@ -70,7 +70,7 @@
|
|||
}
|
||||
|
||||
@inproceedings{seger2006design,
|
||||
title={The design of a floating point execution unit using the Integrated Design and Verification (IDV) system},
|
||||
title={The design of a floating point execution unit using the Integrated Design and Verification (IDV) system (abstract only)},
|
||||
author={Seger, Carl},
|
||||
booktitle={Int. Workshop on Designing Correct Circuits},
|
||||
year={2006}
|
||||
|
|
|
@ -84,17 +84,35 @@ reliability.
|
|||
|
||||
To avoid such disasters, the industry nowadays uses a wide range of techniques
|
||||
to catch bugs as early as possible --- which, hopefully, is before the
|
||||
product's release date. Among those are \todo{list + cite}, but also proved
|
||||
hardware. On circuits as complex as processors, usually, only sub-components
|
||||
are proved correct with respect to a given specification of its behaviour,
|
||||
product's release date. These techniques include of course a lot of testing on
|
||||
simulated hardware or FPGAs (since an actual processor is extremely expensive
|
||||
to actually print). A lot of testing is run as a routine on the current version
|
||||
of the hardware, to catch and notify the designers, since it remains the
|
||||
easiest way to test the behaviour of a circuit. Symbolic trajectory evaluation
|
||||
has also its place in the domain, allowing one to run a circuit on a few cycles
|
||||
(before it becomes too expensive) with symbolic values, \ie{} variables instead
|
||||
of zeroes, ones and $X$s (for ``not a value''). This kind of testing is way
|
||||
more powerful than plain testing, since its results are more precise; yet it is
|
||||
also too expensive to run on a significantly long number of cycles, and
|
||||
therefore a lot of bugs are impossible to catch this way.
|
||||
|
||||
The previous methods are a good cheap method to test a circuit, but give only
|
||||
little confidence in its correction --- it only proves that among all the cases
|
||||
that were tested, all yielded a correct behaviour. These reasons led to the
|
||||
development of proved hardware in the industry. On circuits as complex as
|
||||
processors, usually, only sub-components are proved correct with respect to a
|
||||
given specification of its behaviour (usually source code that should behave as
|
||||
the processor is expected to behave, itself with respect to the written
|
||||
documentation draft of the circuit). These proofs are typically valid only
|
||||
while the circuit is kept in a specified context, \ie{} a set of valid inputs
|
||||
and outputs, etc. --- that should, but is not proved to, be respected by the
|
||||
other parts of the circuit. Yet, this trade-off between proved correctness and
|
||||
engineer's work time already gives a pretty good confidence in the circuit.
|
||||
|
||||
In this context, Carl Seger was one of the main developers of fl at Intel, a
|
||||
functional ml-inspired programming language integrating many features useful to
|
||||
get insights of a circuit, testing it and proving it. It mostly features
|
||||
In this context, Carl Seger was one of the main developers of fl at
|
||||
Intel~\cite{seger1993vos}~\cite{seger2005industrially}~\cite{seger2006design},
|
||||
a functional ml-inspired programming language integrating many features useful
|
||||
to get insights of a circuit, testing it and proving it. It mostly features
|
||||
symbolic trajectory evaluation based model checking and theorem proving, and is
|
||||
intended to be an all-purpose toolbox for the hardware prover.
|
||||
|
||||
|
@ -291,6 +309,8 @@ and eventually lose patience.
|
|||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{General approach}
|
||||
|
||||
\subsection{Theoretical solution}
|
||||
|
||||
The global strategy used to solve efficiently the problem can be broken down to
|
||||
three main parts.
|
||||
|
||||
|
@ -329,6 +349,19 @@ This part mostly consists in an implementation of Ullmann's algorithm for
|
|||
subgraph isomorphism~\cite{ullmann1976algorithm}, a well-known algorithm for
|
||||
this problem, that uses the specificities of the graph to be a little faster.
|
||||
|
||||
\subsection{Workflow}
|
||||
|
||||
In a first time, to get the algorithms, heuristics and methods right, I
|
||||
designed a prototype in OCaml. This prototype was not intended to --- and
|
||||
neither was --- optimized, but allowed me to find places where the program took
|
||||
an unacceptable amount of time. For instance, I left the prototype that I
|
||||
thought fast enough to compute a group equality a whole night long, before
|
||||
finding out in the morning it was actually not fast enough at all. This first
|
||||
version was also written in a strongly typed language, with a lot of static
|
||||
guarantees. It gave me enough confidence in my methods and strategies to move
|
||||
on to a way faster and optimized C++ version, the current version of
|
||||
\emph{isomatch}.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Signatures}\label{sec:signatures}
|
||||
|
||||
|
@ -670,7 +703,50 @@ a given hierarchy group of the haystack. To avoid wasting computation time, we
|
|||
first check that every signature present in the needle is present at least as
|
||||
many times in the haystack. This simple check saved a lot of time.
|
||||
|
||||
\todo{More stuff?}
|
||||
\paragraph{Conversion to adjacency matrix.} The internal structures and graphs
|
||||
are represented as inherited classes of \lstcpp{CircuitTree}, connected to
|
||||
various \lstcpp{WireId}s. Thus, there is no adjacency matrix easily available,
|
||||
and the various vertices have no predefined IDs. Thus, we need to assign IDs to
|
||||
every vertex, \ie{} every gate and wire.
|
||||
|
||||
\paragraph{Order of rows and columns} In his original paper, Ullmann recommends
|
||||
to index the vertices in order of decreasing degree (\ie, the vertices with
|
||||
highest degree are topmost/leftmost in the matrix). This amplifies the effect
|
||||
of the refinement procedure, since vertices with higher degree are connected to
|
||||
more vertices, and thus has a higher chance of having a neighbour with no
|
||||
matching neighbour in the haystack. This allows the algorithm to cut whole
|
||||
branches of the search tree very early, and speeds up the algorithm a lot. Yet,
|
||||
Ullmann also states that on specific cases, a different order might be more
|
||||
appropriate.
|
||||
|
||||
My idea was that it might be interesting to put first all the wires, and then
|
||||
all the circuits, or the other way around. For that, I did a few benchmarks.
|
||||
The measures were made on a system with a nearly-constant load during the
|
||||
experiments. The machine had a i7-6700 processor (3.6GHz reported % chktex 8
|
||||
frequency). All the measures are averaged over 100 runs, and were measured on
|
||||
the usual test set.
|
||||
|
||||
\vspace{1em}
|
||||
\begin{tabular}{l r r}
|
||||
\textbf{Ordering method} & \textbf{Run time (ms)} & \textbf{Loss (\%)} \\
|
||||
Wires by degree decreasing, then gates as they come & 48.8 & --- \\
|
||||
As they come, gates then wires & 49.1 & 0.6\% \\
|
||||
Wires by degree decreasing, then gates by degree decreasing & 49.3
|
||||
& 1.0\% \\
|
||||
As they come, wires then gates & 49.3 & 1.0\% \\
|
||||
Gates as they come, then wires by degree decreasing & 49.5 & 1.4\% \\
|
||||
By degree decreasing, all mixed & 49.5 & 1.4\% \\
|
||||
\end{tabular}
|
||||
\vspace{1em}
|
||||
|
||||
The time loss is nearly insignificant, and can be explained by constant costs:
|
||||
when we want to sort vertices, the vector must be copied then sorted, which can
|
||||
be more expensive than just taking its elements as they come, if the gain is
|
||||
not high enough.
|
||||
|
||||
Nevertheless, I chose to implement the fastest option with respect to this
|
||||
benchmark. If the gap is nearly insignificant, the choice can't really be
|
||||
drastically harmful in the end.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Performance}
|
||||
|
|
Loading…
Reference in a new issue