\title[] {Reliable and Fast DWARF-based Stack Unwinding}
\author[\slidecountline]{\alert{\textbf{Théophile Bastian}}\\
\textbf{Stephen Kell} \\
\textbf{Francesco Zappa Nardelli}}
\institute{ENS Paris, University of Kent, Inria}
\textbf{Webpage} (incl. slides)
ONR VerticA \\
Google Research Fellowship
\section{DWARF and stack unwinding data}
$ ./a.out
Segmentation fault.
|\pause|(gdb) backtrace
#0 |0x54625| in fct_b
#1 |\color{blue}0x54663| in fct_a
#2 |\color{red}0x54674| in main
\textbf{\Large How does it work?!}
\subsection{Stack frames and unwinding}
How do we get the RA\@?\\Easy, \reg{rbp}!
\onslide<2>{What if we only have \reg{rsp}?}
\subsection{DWARF tables}
\begin{frame}{DWARF unwinding data}
~PC & CFA & rbx & rbp & r12 & r13 & r14 & r15 & ra \\
0084950 & rsp+8 & u & u & u & u & u & u & c-8 \\
0084952 & rsp+16 & u & u & u & u & u & c-16 & c-8 \\
0084954 & rsp+24 & u & u & u & u & c-24 & c-16 & c-8 \\
0084956 & rsp+32 & u & u & u & c-32 & c-24 & c-16 & c-8 \\
0084958 & rsp+40 & u & u & c-40 & c-32 & c-24 & c-16 & c-8 \\
0084959 & rsp+48 & u & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
\rowcolor{Aquamarine} 008495a & rsp+56 & c-56 & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
0084962 & rsp+64 & c-56 & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
0084a19 & rsp+56 & c-56 & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
0084a1d & rsp+48 & c-56 & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
0084a1e & rsp+40 & c-56 & c-48 & c-40 & c-32 & c-24 & c-16 & c-8 \\
\textbf{For each instruction\ldots}\\
(identified by its program counter)
\textbf{\ldots{}an expression to compute its return address
location on the stack}
\begin{frame}[t, fragile]{The real DWARF}
30 24 34 FDE pc=004020..004040
DW_CFA_def_cfa_offset: 16
DW_CFA_advance_loc: 6 to 0000000000004026
DW_CFA_def_cfa_offset: 24
DW_CFA_advance_loc: 10 to 0000000000004030
DW_CFA_def_cfa_expression (DW_OP_breg7 (rsp): 8; DW_OP_breg16 (rip): 0; DW_OP_lit15; DW_OP_and; DW_OP_lit11; DW_OP_ge; DW_OP_lit3; DW_OP_shl; DW_OP_plus)
\item[\textbf{$\longrightarrow$}] \textbf{\alert{constructed} on-demand
by a \alert{Turing-complete stack machine}!}
Complex \,\& \,slow
\textbf{Pervasive:}\\ relied upon by debuggers, profilers, C++
exceptions \\
\textbf{$\leadsto$ not only for debuggers!}
``Sorry, but last time was too f\dots painful. The whole (and
only) point of unwinders is to make debugging easy
when a bug occurs. But \alert{the dwarf unwinder had bugs}
itself, or \alert{our dwarf information had bugs}, and in either
case it actually turned several trivial bugs into a \alert{total
undebuggable hell}.''
``If you can \alert{mathematically prove that the unwinder is
correct} — even in the presence of bogus and actively
incorrect unwinding information — and never ever
follows a bad pointer, \alert{Ill reconsider}.''
\hfill ---~Linus Torvalds, 2012
\alert{This is where we still are!}
\section{Correct by construction unwinding tables: synthesis}
\tblrowval{\hspace{-2ex}<{\bf foo}>:}{}{\textbf{CFA}}{\textbf{ra}}
\rowonly<4>{\tblhl{}} \tblrowval{push}{\%r15}{rsp+8}{c-8}
\rowonly<5>{\tblhl{}} \tblrowval{push}{\%r14}{rsp+16}{c-8}
\rowonly<6>{\tblhl{}} \tblrowval{mov}{\$0x3,\%eax}{rsp+24}{c-8}
\rowonly<7>{\tblhl{}} \tblrowval{push}{\%r13}{rsp+24}{c-8}
\alert{\bf Assumptions:}
\item the assembly is was generated by a compiler
\item which also generated unwinding data
\item and I have a reliable DWARF interpreter
\only<4>{Upon function call, \alert{ra = *(\reg{rsp})}}
\only<5>{\texttt{push} decreases \reg{rsp} by 8: %
\alert{ra = *(\reg{rsp} + 8)}}
\only<6>{and again: %
\alert{ra = *(\reg{rsp} + 16)}}
\only<7>{This \texttt{mov} leaves \reg{rsp} untouched: %
\alert{ra = *(\reg{rsp} + 16)}}
\only<8>{The unwinding table captures an \alert{abstract execution}
of the code\ldots}
\only<9>{\ldots and thus is \alert{redundant with the binary}.}
\section{Unwinding data synthesis from binaries}
\begin{frame}{Synthesis strategy}
\item Upon entering a function, we know
\[ \cfa = \reg{rsp} - 8
\qquad \ra = \cfa + 8 \]
\item The semantics of each instruction specifies \alert{how it changes
the \cfa}.
\item Heuristic to decide whether we index with \reg{rbp} or
\item By performing symbolic execution, we can \alert{synthesize the
unwinding table} line by line.
\item Control flow: forward data-flow analysis
\item The fixpoints are immediate, cf article
Implemented on top of CMU's \prog{BAP}
Demo time!
\section{Unwinding data compilation}
\subsection{Compilation ahead-of-time}
\node (dwarf) at (0, 0) {
\node (table) at (0.5\textwidth, -0.23\textheight) {
~PC & CFA & rbx & rbp & ra \\
0084950 & rsp+8 & u & u & c-8 \\
0084952 & rsp+16 & u & u & c-8 \\
0084954 & rsp+24 & u & u & c-8 \\
0084956 & rsp+32 & u & u & c-8 \\
\node (csrc) at (0, -0.6\textheight) {
\node (ehelf) at (0.55\textwidth, -0.75\textheight) {
ELF file:
\only<2->{\path [->] (dwarf) -| node {runtime} (table);}
\path [->] (dwarf) edge node {ahead of time} (csrc);
\path [->] (csrc) -| node {gcc, AoT} (ehelf);
\item \alert{libunwind}: \textit{de facto} standard library for
\item \texttt{libunwind-eh\_elf}: alternative implementation using
\item[$\leadsto$] Same API, almost \alert{``relink-and-play''} for existing projects!
\item \alert{Speedup}: x15 (\prog{gzip}) to x25 (\prog{hackbench}) vs.
\item libunwind: state of the art, aggressive caching.
\item \alert{Space overhead}: x2.6 to x3 vs. DWARF
What's next?
\item{} Synthesis + compare = verification of unwinding data!
\item{} Integrate synthesis into compilers \& debuggers\\
$\rightarrow$ support for inline assembly, fallback method, \ldots
\item{} Integrate into \prog{perf} for a faster analysis
\item{} Probably many more cool things to do!
Come and chat if interested! \texttt{:)}