Add heuristics analysis
This commit is contained in:
parent
c74ec873eb
commit
b2cf0a77df
1 changed files with 99 additions and 0 deletions
99
HEURISTICS.md
Normal file
99
HEURISTICS.md
Normal file
|
@ -0,0 +1,99 @@
|
||||||
|
# Heuristics used for synthesis
|
||||||
|
|
||||||
|
This file lists the major heuristics used for synthesis.
|
||||||
|
|
||||||
|
## Initial row
|
||||||
|
|
||||||
|
Initial row is always assumed as
|
||||||
|
CFA rbp ra
|
||||||
|
rsp+8 u c-8
|
||||||
|
|
||||||
|
## With or without %rbp?
|
||||||
|
|
||||||
|
When synthesizing a FDE, there is sometimes a choice between using %rbp or
|
||||||
|
not. For instance, it is possible that the original program uses %rbp for
|
||||||
|
something entirely different than keeping a base pointer, without it being
|
||||||
|
obvious: the synthesis must then avoid using %rbp.
|
||||||
|
|
||||||
|
When synthesizing a FDE, two passes are applied on the function: a first pass
|
||||||
|
that tracks %rbp to generate a correct table, but is denied using %rbp as an
|
||||||
|
indexing mean for CFA. If this first pass fails by losing track of its CFA at
|
||||||
|
some point, we fall back to a second phase that does the same, but switches its
|
||||||
|
CFA indexing to %rbp if possible.
|
||||||
|
|
||||||
|
This method works in practice because
|
||||||
|
* if the first pass succeeded, then a correct CFA indexing was found,
|
||||||
|
* if not, the original compiler could not generate a correct CFA indexing
|
||||||
|
either and was forced to use %rbp as a base pointer (except corner cases,
|
||||||
|
eg. clang sometimes generate code without possible correct unwinding data in
|
||||||
|
pre-abort error handling paths)
|
||||||
|
|
||||||
|
## Lossy merge
|
||||||
|
|
||||||
|
When two or more code branches merge at some point, we require that the
|
||||||
|
unwinding data propagated by all of the branches can be merged into
|
||||||
|
consistent data.
|
||||||
|
|
||||||
|
Most of the time, *consistent* means strictly equivalent, but it can be
|
||||||
|
weakened by allowing rows with %rbp undefined on one side and defined on the
|
||||||
|
other to be merged — thus assuming the merged data is %rbp undefined, allowing
|
||||||
|
a information loss.
|
||||||
|
|
||||||
|
We actually process the control flow graph of a subroutine by walking it
|
||||||
|
depth-first. When first encountering a new block, the propagated row is saved
|
||||||
|
as the initial data for this block. When we encounter it again from another
|
||||||
|
predecessor, the propagated row is merged if possible, or aborts with
|
||||||
|
inconsistency. This merge operation is thus algorithmically free if the data
|
||||||
|
first stored in the block is %rbp undefined — it is possible to just erase the
|
||||||
|
data on the newly merged unwinding data. The other way around, changing the
|
||||||
|
data already present, with which subsequent computations have already been
|
||||||
|
made, would require recomputing a lot of data. We thus *only allow it* if the
|
||||||
|
block is a leaf block in the control flow graph of the subroutine.
|
||||||
|
|
||||||
|
This restriction in the application conditions works well in practice because
|
||||||
|
gcc does not generate such lossy merges, and clang generates those only for the
|
||||||
|
exit block of a function — just before `retq`.
|
||||||
|
|
||||||
|
## CFA state tracking
|
||||||
|
|
||||||
|
### When CFA is an offset of %rsp
|
||||||
|
|
||||||
|
If the CFA is an offset of %rsp, it must be kept up to date when %rsp changes.
|
||||||
|
In the BAP IR, every such change will generate some instruction `%rsp <- EXPR`.
|
||||||
|
|
||||||
|
* If the expression is just `%rsp <- %rsp + offset`, the CFA is updated with
|
||||||
|
this offset (most cases).
|
||||||
|
* If not, the analysis loses track and aborts. This case did not occur during
|
||||||
|
our testing while the CFA was indexed by %rsp.
|
||||||
|
|
||||||
|
### When CFA is offset of %rbp
|
||||||
|
|
||||||
|
If the CFA is an offset of %rbp, nothing special is required to track the CFA.
|
||||||
|
|
||||||
|
### Switching between the modes: %rsp to %rbp indexing
|
||||||
|
|
||||||
|
If the CFA is currently an offset of %rsp, an indexing mode change is detected
|
||||||
|
when %rip is saved to %rbp. If the synthesis is currently allowed to use %rbp
|
||||||
|
indexing (see *With or without %rbp?*), the indexing mode is then switched. If
|
||||||
|
not, the current CFA indexing is kept.
|
||||||
|
|
||||||
|
### Switching between the modes: %rbp to %rsp indexing
|
||||||
|
|
||||||
|
The only event that triggers a revert to %rsp-based indexing is when %rbp gets
|
||||||
|
overwritten with something while %rbp indexing.
|
||||||
|
|
||||||
|
It is non-trivial to decide which %rsp offset should be used when switching
|
||||||
|
back. So far, we have only encountered switches back to %rsp at the very end of
|
||||||
|
functions — when %rbp was popped from the stack. Thus, we thus assume that upon
|
||||||
|
restore, CFA=%rsp+8. This only works in practice since in the observed cases,
|
||||||
|
compilers tend to stick to %rbp indexing when they decide to use it in a
|
||||||
|
function.
|
||||||
|
|
||||||
|
## %rbp state tracking
|
||||||
|
|
||||||
|
Tracking the state of %rbp (or any other callee-saved register) can be done by
|
||||||
|
tracking the program points at which
|
||||||
|
|
||||||
|
* %rbp is undefined and an instruction saves %rbp to the stack,
|
||||||
|
* %rbp is defined and an instruction overwrites %rbp with the data initially
|
||||||
|
saved on the stack
|
Loading…
Reference in a new issue