Add heuristics analysis
This commit is contained in:
parent
c74ec873eb
commit
b2cf0a77df
1 changed files with 99 additions and 0 deletions
99
HEURISTICS.md
Normal file
99
HEURISTICS.md
Normal file
|
@ -0,0 +1,99 @@
|
|||
# Heuristics used for synthesis
|
||||
|
||||
This file lists the major heuristics used for synthesis.
|
||||
|
||||
## Initial row
|
||||
|
||||
Initial row is always assumed as
|
||||
CFA rbp ra
|
||||
rsp+8 u c-8
|
||||
|
||||
## With or without %rbp?
|
||||
|
||||
When synthesizing a FDE, there is sometimes a choice between using %rbp or
|
||||
not. For instance, it is possible that the original program uses %rbp for
|
||||
something entirely different than keeping a base pointer, without it being
|
||||
obvious: the synthesis must then avoid using %rbp.
|
||||
|
||||
When synthesizing a FDE, two passes are applied on the function: a first pass
|
||||
that tracks %rbp to generate a correct table, but is denied using %rbp as an
|
||||
indexing mean for CFA. If this first pass fails by losing track of its CFA at
|
||||
some point, we fall back to a second phase that does the same, but switches its
|
||||
CFA indexing to %rbp if possible.
|
||||
|
||||
This method works in practice because
|
||||
* if the first pass succeeded, then a correct CFA indexing was found,
|
||||
* if not, the original compiler could not generate a correct CFA indexing
|
||||
either and was forced to use %rbp as a base pointer (except corner cases,
|
||||
eg. clang sometimes generate code without possible correct unwinding data in
|
||||
pre-abort error handling paths)
|
||||
|
||||
## Lossy merge
|
||||
|
||||
When two or more code branches merge at some point, we require that the
|
||||
unwinding data propagated by all of the branches can be merged into
|
||||
consistent data.
|
||||
|
||||
Most of the time, *consistent* means strictly equivalent, but it can be
|
||||
weakened by allowing rows with %rbp undefined on one side and defined on the
|
||||
other to be merged — thus assuming the merged data is %rbp undefined, allowing
|
||||
a information loss.
|
||||
|
||||
We actually process the control flow graph of a subroutine by walking it
|
||||
depth-first. When first encountering a new block, the propagated row is saved
|
||||
as the initial data for this block. When we encounter it again from another
|
||||
predecessor, the propagated row is merged if possible, or aborts with
|
||||
inconsistency. This merge operation is thus algorithmically free if the data
|
||||
first stored in the block is %rbp undefined — it is possible to just erase the
|
||||
data on the newly merged unwinding data. The other way around, changing the
|
||||
data already present, with which subsequent computations have already been
|
||||
made, would require recomputing a lot of data. We thus *only allow it* if the
|
||||
block is a leaf block in the control flow graph of the subroutine.
|
||||
|
||||
This restriction in the application conditions works well in practice because
|
||||
gcc does not generate such lossy merges, and clang generates those only for the
|
||||
exit block of a function — just before `retq`.
|
||||
|
||||
## CFA state tracking
|
||||
|
||||
### When CFA is an offset of %rsp
|
||||
|
||||
If the CFA is an offset of %rsp, it must be kept up to date when %rsp changes.
|
||||
In the BAP IR, every such change will generate some instruction `%rsp <- EXPR`.
|
||||
|
||||
* If the expression is just `%rsp <- %rsp + offset`, the CFA is updated with
|
||||
this offset (most cases).
|
||||
* If not, the analysis loses track and aborts. This case did not occur during
|
||||
our testing while the CFA was indexed by %rsp.
|
||||
|
||||
### When CFA is offset of %rbp
|
||||
|
||||
If the CFA is an offset of %rbp, nothing special is required to track the CFA.
|
||||
|
||||
### Switching between the modes: %rsp to %rbp indexing
|
||||
|
||||
If the CFA is currently an offset of %rsp, an indexing mode change is detected
|
||||
when %rip is saved to %rbp. If the synthesis is currently allowed to use %rbp
|
||||
indexing (see *With or without %rbp?*), the indexing mode is then switched. If
|
||||
not, the current CFA indexing is kept.
|
||||
|
||||
### Switching between the modes: %rbp to %rsp indexing
|
||||
|
||||
The only event that triggers a revert to %rsp-based indexing is when %rbp gets
|
||||
overwritten with something while %rbp indexing.
|
||||
|
||||
It is non-trivial to decide which %rsp offset should be used when switching
|
||||
back. So far, we have only encountered switches back to %rsp at the very end of
|
||||
functions — when %rbp was popped from the stack. Thus, we thus assume that upon
|
||||
restore, CFA=%rsp+8. This only works in practice since in the observed cases,
|
||||
compilers tend to stick to %rbp indexing when they decide to use it in a
|
||||
function.
|
||||
|
||||
## %rbp state tracking
|
||||
|
||||
Tracking the state of %rbp (or any other callee-saved register) can be done by
|
||||
tracking the program points at which
|
||||
|
||||
* %rbp is undefined and an instruction saves %rbp to the stack,
|
||||
* %rbp is defined and an instruction overwrites %rbp with the data initially
|
||||
saved on the stack
|
Loading…
Reference in a new issue