1
0
Fork 0
mirror of https://github.com/tobast/libunwind-eh_elf.git synced 2025-01-08 18:33:42 +01:00

Update it some more.

(Logical change 1.147)
This commit is contained in:
mostang.com!davidm 2003-12-21 05:53:57 +00:00
parent 8365c7bc60
commit 3d24b59dba

View file

@ -10,55 +10,63 @@
\section{Introduction}
For \Prog{libunwind} to do its work, it needs to be able to
reconstruct the \emph{frame state} of each frame in a call-chain. The
frame state consists of some frame registers (such as the
instruction-pointer and the stack-pointer) and the locations at which
the current values of every callee-saved (``preserved'') resides.
For \Prog{libunwind} to do its job, it needs to be able to reconstruct
the \emph{frame state} of each frame in a call-chain. The frame state
describes the subset of the machine-state that consists of the
\emph{frame registers} (typically the instruction-pointer and the
stack-pointer) and all callee-saved registers (preserved registers).
The frame state describes each register either by providing its
current value (for frame registers) or by providing the location at
which the current value is stored (callee-saved registers).
The purpose of the dynamic unwind-info is therefore to provide
\Prog{libunwind} the minimal information it needs about each
dynamically generated procedure such that it can reconstruct the
procedure's frame state.
For statically generated code, the compiler normally takes care of
emitting \emph{unwind-info} which provides the minimum amount of
information needed to reconstruct the frame-state for each instruction
in a procedure. For dynamically generated code, the runtime code
generator must use the dynamic unwind-info interface provided by
\Prog{libunwind} to supply the equivalent information. This manual
page describes the format of this information in detail.
For the purpose of the following discussion, a \emph{procedure} is any
contiguous piece of code. Normally, each procedure directly
corresponds to a function in the source-language but this is not
strictly required. For example, a runtime code-generator could
translate a given function into two separate (discontiguous)
procedures: one for frequently-executed (hot) code and one for
rarely-executed (cold) code. Similarly, simple source-language
functions (usually leaf functions) may get translated into code for
which the default unwind-conventions apply and for such code, no
dynamic unwind info needs to be registered.
For the purpose of this discussion, a \emph{procedure} is defined to
be an arbitrary piece of \emph{contiguous} code. Normally, each
procedure directly corresponds to a function in the source-language
but this is not strictly required. For example, a runtime
code-generator could translate a given function into two separate
(discontiguous) procedures: one for frequently-executed (hot) code and
one for rarely-executed (cold) code. Similarly, simple
source-language functions (usually leaf functions) may get translated
into code for which the default unwind-conventions apply and for such
code, it is not strictly necessary to register dynamic unwind-info.
Within a procedure, the code can be thought of as being divided into a
sequence of \emph{regions}. Each region logically consists of an
optional \emph{prologue}, a \emph{body}, and an optional
\emph{epilogue}. If present, the prologue sets up the frame state for
the body, which does the actual work of the procedure. For example,
the prologue may need to allocate a stack-frame and save some
callee-saved registers before the body can start executing.
Correspondingly, the epilogue, if present, restores the previous frame
state and thereby undoes the effect of the prologue. Regions are
nested in the sense that the frame state at the end of a region serves
as the entry-state of the next region. At the end of several nested
regions, there may be a single epilogue which undoes the effect of all
the prologues in the nested regions.
A procedure logically consists of a sequence of \emph{regions}.
Regions are nested in the sense that the frame state at the end of one
region is, by default, assumed to be the frame state for the next
region. Each region is thought of as being divided into a
\emph{prologue}, a \emph{body}, and an \emph{epilogue}. Each of them
can be empty. If non-empty, the prologue sets up the frame state for
the body. For example, the prologue may need to allocate some space
on the stack and save certain callee-saved registers. The body
performs the actual work of the procedure but does not change the
frame state in any way. If non-empty, the epilogue restores the
previous frame state and as such it undoes or cancels the effect of
the prologue. In fact, a single epilogue may undo the effect of the
prologues of several (nested) regions.
Even though logically we think of the prologue, body, and epilogue as
separate entities, optimizing code-generators will generally
interleave instructions from all three entities to achieve higher
performance. In fact, as far as the dynamic unwind-info is concerned,
there is no distinction at all between prologue and body. Similarly,
the exact set of instructions that make up an epilogue is also
irrelevant. The only point in the epilogue that needs to be described
explicitly is the point at which the stack-pointer gets restored. The
reason this point needs to be described is that once the stack-pointer
is restored, all values saved in the deallocated portion of the stack
become invalid. All other locations that store the values of
callee-saved register are assumed to remain valid throughout the end
of the region.
We should point out that even though the prologue, body, and epilogue
are logically separate entities, optimizing code-generators will
generally interleave instructions from all three entities. For this
reason, the dynamic unwind-info interface of \Prog{libunwind} makes no
distinction whatsoever between prologue and body. Similarly, the
exact set of instructions that make up an epilogue is also irrelevant.
The only point in the epilogue that needs to be described explicitly
by the dynamic unwind-info is the point at which the stack-pointer
gets restored. The reason this point needs to be described is that
once the stack-pointer is restored, all values saved in the
deallocated portion of the stack frame become invalid and hence
\Prog{libunwind} needs to know about it. The portion of the frame
state not saved on the stack is assume to remain valid through the end
of the region. For this reason, there is usually no need to describe
instructions which restore the contents of callee-saved registers.
Within a region, each instruction that affects the frame state in some
fashion needs to be described with an operation descriptor. For this
@ -75,40 +83,286 @@ in the stack frame.
\section{Procedures}
unw\_dyn\_info\_t
unw\_dyn\_proc\_info\_t
unw\_dyn\_table\_info\_t
unw\_dyn\_remote\_table\_info\_t
A runtime code-generator registers the dynamic unwind-info of a
procedure by setting up a structure of type \Type{unw\_dyn\_info\_t}
and calling \Func{\_U\_dyn\_register}(), passing the address of the
structure as the sole argument. The members of the
\Type{unw\_dyn\_info\_t} structure are described below:
\begin{itemize}
\item[\Type{void~*}next] Private to \Prog{libunwind}. Must not be used
by the application.
\item[\Type{void~*}prev] Private to \Prog{libunwind}. Must not be used
by the application.
\item[\Type{unw\_word\_t} \Var{start\_ip}] The start-address of the
instructions of the procedure (remember: procedure are defined to be
contiguous pieces of code, so a single code-range is sufficient).
\item[\Type{unw\_word\_t} \Var{end\_ip}] The end-address of the
instructions of the procedure (non-inclusive, that is,
\Var{end\_ip}-\Var{start\_ip} is the size of the procedure in
bytes).
\item[\Type{unw\_word\_t} \Var{gp}] The global-pointer value in use
for this procedure. The exact meaing of the global-pointer is
architecture-specific and on some architecture, it is not used at
all.
\item[\Type{int32\_t} \Var{format}] The format of the unwind-info.
This member can be one of \Const{UNW\_INFO\_FORMAT\_DYNAMIC},
\Const{UNW\_INFO\_FORMAT\_TABLE}, or
\Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
\item[\Type{union} \Var{u}] This union contains one sub-member
structure for every possible unwind-info format:
\begin{description}
\item[\Type{unw\_dyn\_proc\_info\_t} \Var{pi}] This member is used
for format \Const{UNW\_INFO\_FORMAT\_DYNAMIC}.
\item[\Type{unw\_dyn\_table\_info\_t} \Var{ti}] This member is used
for format \Const{UNW\_INFO\_FORMAT\_TABLE}.
\item[\Type{unw\_dyn\_remote\_table\_info\_t} \Var{rti}] This member
is used for format \Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
\end{description}\
The format of these sub-members is described in detail below.
\end{itemize}
\section{Regions}
\subsection{Proc-info format}
unw\_dyn\_region\_info\_t:
- insn_count can be negative to indicate that the region is
at the end of the procedure; in such a case, the negated
insn_count value specifies the length of the final region
in number of instructions. There must be at most one region
with a negative insn_count and only the last region in a
procedure's region list may be negative. Furthermore, both
di->start\_ip and di->end\_ip must be valid.
This is the preferred dynamic unwind-info format and it is generally
the one used by full-blown runtime code-generators. In this format,
the details of a procedure are described by a structure of type
\Type{unw\_dyn\_proc\_info\_t}. This structure contains the following
members:
\begin{description}
\section{Operations}
\item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
(human-readable) name of the procedure or 0 if no such name is
available. If non-zero, The string stored at this address must be
ASCII NUL terminated. For source languages that use name-mangling
(such as C++ or Java) the string stored at this address should be
the \emph{demangled} version of the name.
\item[\Type{unw\_word\_t} \Var{handler}] The address of the
personality-routine for this procedure. Personality-routines are
used in conjunction with exception handling. See the C++ ABI draft
(http://www.codesourcery.com/cxx-abi/) for an overview and a
description of the personality routine. If the procedure has no
personality routine, \Var{handler} must be set to 0.
\item[\Type{uint32\_t} \Var{flags}] A bitmask of flags. At the
moment, no flags have been defined and this member must be
set to 0.
\item[\Type{unw\_dyn\_region\_info\_t~*}\Var{regions}] A NULL-terminated
linked list of region-descriptors. See section ``Region
descriptors'' below for more details.
\end{description}
\subsection{Table-info format}
This format is generally used when the dynamically generated code was
derived from static code and the unwind-info for the dynamic and the
static versions is identical. For example, this format can be useful
when loading statically-generated code into an address-space in a
non-standard fashion (i.e., through some means other than
\Func{dlopen}()). In this format, the details of a group of procedures
is described by a structure of type \Type{unw\_dyn\_table\_info}.
This structure contains the following members:
\begin{description}
\item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
(human-readable) name of the procedure or 0 if no such name is
available. If non-zero, The string stored at this address must be
ASCII NUL terminated. For source languages that use name-mangling
(such as C++ or Java) the string stored at this address should be
the \emph{demangled} version of the name.
\item[\Type{unw\_word\_t} \Var{segbase}] The segment-base value
that needs to be added to the segment-relative values stored in the
unwind-info. The exact meaning of this value is
architecture-specific.
\item[\Type{unw\_word\_t} \Var{table\_len}] The length of the
unwind-info (\Var{table\_data}) counted in units of words
(\Type{unw\_word\_t}).
\item[\Type{unw\_word\_t} \Var{table\_data}] A pointer to the actual
data encoding the unwind-info. The exact format is
architecture-specific (see architecture-specific sections below).
\end{description}
\subsection{Remote table-info format}
The remote table-info format has the same basic purpose as the regular
table-info format. The only difference is that when \Prog{libunwind}
uses the unwind-info, it will keep the table data in the target
address-space (which may be remote). Consequently, the type of the
\Var{table\_data} member is \Type{unw\_word\_t} rather than a pointer.
This implies that \Prog{libunwind} will have to access the table-data
via the address-space's \Func{access\_mem}() call-back, rather than
through a direct memory reference.
From the point of view of a runtime-code generator, the remote
table-info format offers no advantage and it is expected that such
generators will describe their procedures either with the proc-info
format or the normal table-info format. The main reason that the
remote table-info format exists is to enable the
address-space-specific \Func{find\_proc\_info}() callback (see
\SeeAlso{unw\_create\_addr\_space}(3)) to return unwind tables whose
data remains in remote memory. This can speed up unwinding (e.g., for
a debugger) because it reduces the amount of data that needs to be
loaded from remote memory.
\section{Regions descriptors}
A region descriptor is a variable length structure that describes how
each instruction in the region affects the frame state. Of course,
most instructions in a region usualy do not change the frame state and
for those, nothing needs to be recorded in the region descriptor. A
region descriptor is a structure of type
\Type{unw\_dyn\_region\_info\_t} and has the following members:
\begin{description}
\item[\Type{unw\_dyn\_region\_info\_t~*}\Var{next}] A pointer to the
next region. If this is the last region, \Var{next} is \Const{NULL}.
\item[\Type{int32\_t} \Var{insn\_count}] The length of the region in
instructions. Each instruction is assumed to have a fixed size (see
architecture-specific sections for details). The value of
\Var{insn\_count} may be negative in the last region of a procedure
(i.e., it may be negative only if \Var{next} is \Const{NULL}). A
negative value indicates that the region covers the last \emph{N}
instructions of the procedure, where \emph{N} is the absolute value
of \Var{insn\_count}.
\item[\Type{uint32\_t} \Var{op\_count}] The (allocated) length of
the \Var{op\_count} array.
\item[\Type{unw\_dyn\_op\_t} \Var{op}] An array of dynamic unwind
directives. See Section ``Dynamic unwind directives'' for a
description of the directives.
\end{description}
A region descriptor with an \Var{insn\_count} of zero is an
\emph{empty region} and such regions are perfectly legal. In fact,
empty regions can be useful to establish a particular frame state
before the start of another region.
A single region list can be shared across multiple procedures provided
those procedures share a common prologue and epilogue (their bodies
may differ, of course). Normally, such procedures consist of a canned
prologue, the body, and a canned epilogue. This could be described by
two regions: one covering the prologue and one covering the epilogue.
Since the body length is variable, the latter region would need to
specify a negative value in \Var{insn\_count} such that
\Prog{libunwind} knows that the region covers the end of the procedure
(up to the address specified by \Var{end\_ip}).
The region descriptor is a variable length structure to make it
possible to allocate all the necessary memory with a single
memory-allocation request. To facilitate the allocation of a region
descriptors \Prog{libunwind} provides a helper routine with the
following synopsis:
\noindent
\Type{size\_t} \Func{\_U\_dyn\_region\_size}(\Type{int} \Var{op\_count});
This routine returns the number of bytes needed to hold a region
descriptor with space for \Var{op\_count} unwind directives. Note
that the length of the \Var{op} array does not have to match exactly
with the number of directives in a region. Instead, it is sufficient
if the \Var{op} array contains at least as many entries as there are
directives, since the end of the directives can always be indicated
with the \Const{UNW\_DYN\_STOP} directive.
\section{Dynamic unwind directives}
A dynamic unwind directive describes how the frame state changes
at a particular point within a region. The description is in
the form of a structure of type \Type{unw\_dyn\_op\_t}. This
structure has the following members:
\begin{description}
\item[\Type{int8\_t} \Var{tag}] The operation tag. Must be one
of the \Type{unw\_dyn\_operation\_t} values described below.
\item[\Type{int8\_t} \Var{qp}] The qualifying predicate that controls
whether or not this directive is active. This is useful for
predicated architecturs such as IA-64 or ARM, where the contents of
another (callee-saved) register determines whether or not an
instruction is executed (takes effect). If the directive is always
active, this member should be set to the manifest constant
\Const{\_U\_QP\_TRUE} (this constant is defined for all
architectures, predicated or not).
\item[\Type{int16\_t} \Var{reg}] The number of the register affected
by the instruction.
\item[\Type{int32\_t} \Var{when}] The region-relative number of
the instruction to which this directive applies. For example,
a value of 0 means that the effect described by this directive
has taken place once the first instruction in the region has
executed.
\item[\Type{unw\_word\_t} \Var{val}] The value to be applied by the
operation tag. The exact meaning of this value varies by tag. See
Section ``Operation tags'' below.
\end{description}
It is perfectly legitimate to specify multiple dynamic unwind
directives with the same \Var{when} value, if a particular instruction
has a complex effect on the frame state.
Empty regions by definition contain no actual instructions and as such
the directives are not tied to a particular instruction. By
convention, the \Var{when} member should be set to 0, however.
There is no need for the dynamic unwind directives to appear
in order of increasing \Var{when} values. If the directives happen to
be sorted in that order, it may result in slightly faster execution,
but a runtime code-generator should not go to extra lengths just to
ensure that the directives are sorted.
IMPLEMENTATION NOTE: should \Prog{libunwind} implementations for
certain architectures prefer the list of unwind directives to be
sorted, it is recommended that such implementations first check
whether the list happens to be sorted already and, if not, sort the
directives explicitly before the first use. With this approach, the
overhead of explicit sorting is only paid when there is a real benefit
and if the runtime code-generator happens to generated sorted lists
naturally, the performance penalty is limited to a simple O(N) check.
\subsection{Operations tags}
The possible operation tags are defined by enumeration type
\Type{unw\_dyn\_operation\_t} which defines the following
values:
\begin{description}
\item[\Const{UNW\_DYN\_STOP}] Marks the end of the dynamic unwind
directive list. All remaining entries in the \Var{op} array of the
region-descriptor are ignored. This tag is guaranteed to have a
value of 0.
\item[\Const{UNW\_DYN\_SAVE\_REG}] Marks an instruction which saves
register \Var{reg} to register \Var{val}.
\item[\Const{UNW\_DYN\_SPILL\_FP\_REL}] Marks an instruction which
spills register \Var{reg} to a frame-pointer-relative location. The
frame-pointer-relative offset is given by the value stored in member
\Var{val}. See the architecture-specific sections for a description
of the stack frame layout.
\item[\Const{UNW\_DYN\_SPILL\_SP\_REL}] Marks an instruction which
spills register \Var{reg} to a stack-pointer-relative location. The
stack-pointer-relative offset is given by the value stored in member
\Var{val}. See the architecture-specific sections for a description
of the stack frame layout.
\item[\Const{UNW\_DYN\_ADD}] Marks an instruction which adds
the constant value \Var{val} to register \Var{reg}. To add subtract
a constant value, store the two's-complement of the value in
\Var{val}. The set of registers that can be specified for this tag
is described in the architecture-specific sections below.
\item[\Const{UNW\_DYN\_POP\_FRAMES}]
\item[\Const{UNW\_DYN\_LABEL\_STATE}]
\item[\Const{UNW\_DYN\_COPY\_STATE}]
\item[\Const{UNW\_DYN\_ALIAS}]
\end{description}
unw\_dyn\_operation\_t
unw\_dyn\_op\_t
\_U\_QP\_TRUE
unw\_dyn\_info\_format\_t
- instructions don't have to be sorted in increasing order of ``when''
values: In general, if you can generate the sorted order easily
(e.g., without an explicit sorting step), I'd recommend doing so
because in that case, should some version of libunwind ever require
sorted order, libunwind can verify in O(N) that the list is sorted
already. In the particular case of the ia64-version of libunwind, a
sorted order won't help, since it always scans the instructions up
to UNW_DYN_STOP.
\_U\_dyn\_region\_info\_size(opcount);
\_U\_dyn\_op\_save\_reg();
\_U\_dyn\_op\_spill\_fp\_rel();
\_U\_dyn\_op\_spill\_sp\_rel();
@ -119,6 +373,17 @@ unw\_dyn\_info\_format\_t
\_U\_dyn\_op\_alias();
\_U\_dyn\_op\_stop();
\section{IA-64 specifics}
- meaning of segbase member in table-info/table-remote-info format
- format of table\_data in table-info/table-remote-info format
- instruction size: each bundle is counted as 3 instructions, regardless
of template (MLX)
- describe stack-frame layout, especially with regards to sp-relative
and fp-relative addressing
- UNW\_DYN\_ADD can only add to ``sp'' (always a negative value); use
POP\_FRAMES otherwise
\section{See Also}
\SeeAlso{libunwind(3)},