diff --git a/manuscrit/20_foundations/20_code_analyzers.tex b/manuscrit/20_foundations/20_code_analyzers.tex index 05ae062..5ea3a03 100644 --- a/manuscrit/20_foundations/20_code_analyzers.tex +++ b/manuscrit/20_foundations/20_code_analyzers.tex @@ -117,8 +117,49 @@ In the case of assembled binaries, as all analyzers were run on Linux, executables or object files are ELF files. Some analyzers work on sections of the file defined by user-provided offsets in the binary, while others require the presence of \iaca{} markers around the code portion or portions to be -analyzed. Those markers, introduced by \iaca{}, consist in the following -assembly snippets: \todo{} +analyzed. Those markers, introduced by \iaca{} as C-level preprocessor +statements, consist in the following x86 assembly snippets: + +\hfill\begin{minipage}{0.35\textwidth} + \begin{lstlisting}[language={[x86masm]Assembler}] +mov ebx, 111 +db 0x64, 0x67, 0x90 +\end{lstlisting} +\textit{\iaca{} start marker} +\end{minipage}\hfill\begin{minipage}{0.35\textwidth} + \begin{lstlisting}[language={[x86masm]Assembler}] +mov ebx, 222 +db 0x64, 0x67, 0x90 +\end{lstlisting} +\textit{\iaca{} end marker} +\end{minipage} + +\medskip + +On UNIX-based operating systems, the standard format for assembled binaries +---~either object files (\lstc{.o}) or executables~--- is ELF~\cite{elf_tis}. +Such files are organized in sections, the assembled instructions themselves +being found in the \texttt{.text} section ---~the rest holding metadata, +program data (strings, icons, \ldots), debugging information, etc. When an ELF +is loaded to memory for execution, each segment may be \emph{mapped} to a +portion of the address space. For instance, if the \texttt{.text} section has +1024 bytes, starting at offset 4096 of the ELF file itself, it may be mapped at +virtual address \texttt{0x454000}; as such, the byte that could be read from +the program by dereferencing address \texttt{0x454010} would be the 16\up{th} +byte from the \texttt{.text} section, that is, the byte at offset 4112 in the +ELF file. + +Throughout the ELF file, \emph{symbols} are defined as references, or pointers, +to specific offsets or chunks in the file. This mechanism is used, among +others, to refer to the program's function. For instance, a symbol +\texttt{main} may be defined, that would point to the offset of the first byte +of the \lstc{main} function, and may also hold its total number of bytes. + +Both these mechanisms can be used to identify, without \iaca{} markers or the +like, a section of ELF file to be analyzed: an offset and size in the +\texttt{.text} section can be provided (which can be found with tools like +\lstc{objdump}), or a symbol name can be provided, if an entire function is to +be analyzed. \subsection{Examples with \llvmmca} diff --git a/manuscrit/biblio/misc.bib b/manuscrit/biblio/misc.bib index 1139896..13770b3 100644 --- a/manuscrit/biblio/misc.bib +++ b/manuscrit/biblio/misc.bib @@ -158,3 +158,10 @@ archivePrefix={arXiv}, primaryClass={cs.PF} } + +@misc{elf_tis, + title={Tool interface standard (TIS) executable and linking format (ELF) specification version 1.2}, + author={{TIS} Committee and others}, + year={1995}, + publisher={May} +}