Bench: evaluate gzip

Benching: evaluate hackbench clearly, improve tools
compiler: generate rows for CIE
2019-06-10 12:06:13 +02:00 · 2019-06-10 12:04:52 +02:00 · 2019-06-09 03:32:54 +02:00
13 changed files with 505 additions and 48 deletions
--- a/benching/README.md
+++ b/benching/README.md
@ -0,0 +1,92 @@
+# Benching `eh_elfs`
+
+## Benchmark setup
+
+Pick some name for your `eh_elfs` directory. We will call it `$EH_ELF_DIR`.
+
+### Generate the `eh_elfs`
+
+```bash
+../../generate_eh_elf.py --deps -o "$EH_ELF_DIR" \
+  --keep-holes -O2 --global-switch --enable-deref-arg "$BENCHED_BINARY"
+```
+
+### Record a `perf` session
+
+```bash
+perf record --call-graph dwarf,4096 "$BENCHED_BINARY" [args]
+```
+
+### Set up the environment
+
+```bash
+source ../../env/apply [vanilla | vanilla-nocache | *eh_elf] [dbg | *release]
+```
+
+The first value selects the version of libunwind you will be running, the
+second selects whether you want to run in debug or release mode (use release to
+get readings, debug to check for errors).
+
+You can reset your environment to its previous state by running `deactivate`.
+
+If you pick the `eh_elf` flavour, you will also have to
+
+```bash
+export LD_LIBRARY_PATH="$EH_ELF_DIR:$LD_LIBRARY_PATH"
+```
+
+## Extract results
+
+### Base readings
+
+**In release mode** (faster), run
+
+```bash
+perf report 2>&1 >/dev/null
+```
+
+with both `eh_elf` and `vanilla` shells. Compare average time.
+
+### Getting debug output
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null
+```
+
+### Total number of calls to `unw_step`
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null | grep -c "step:.* returning"
+```
+
+### Total number of vanilla errors
+
+With the `vanilla` context,
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null | grep -c "step:.* returning -"
+```
+
+### Total number of fallbacks to original DWARF
+
+With the `eh_elf` context,
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null | grep -c "step:.* falling back"
+```
+
+### Total number of fallbacks to original DWARF that actually used DWARF
+
+With the `eh_elf` context,
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null | grep -c "step:.* fallback with"
+```
+
+### Get succeeded fallback locations
+
+```bash
+UNW_DEBUG_LEVEL=5 perf report 2>&1 >/dev/null \
+  | grep "step: .* fallback with" -B 15 \
+  | grep "In memory map" | sort | uniq -c
+```
--- a/benching/gzip/.gitignore
+++ b/benching/gzip/.gitignore
@ -0,0 +1,3 @@
+gzip
+gzip-1.10
+perf.data
--- a/benching/gzip/EVALUATION.md
+++ b/benching/gzip/EVALUATION.md
@ -0,0 +1,49 @@
+# gzip - evaluation
+
+Artifacts saved in `evaluation_artifacts`.
+
+## Performance
+
+Using the command line
+
+```bash
+for i in $(seq 1 100); do
+  perf report 2>&1 >/dev/null | tail -n 1 \
+    | python ../hackbench/to_report_fmt.py \
+    | sed 's/^.* & .* & \([0-9]*\) & .*$/\1/g'
+done
+```
+
+we save a sequence of 100 performance readings to some file.
+
+Samples:
+* `eh_elf`:  331134 unw/exec
+* `vanilla`: 331144 unw/exec
+
+Average time/unw:
+* `eh_elf`:    83 ns
+* `vanilla`: 1304 ns
+
+Standard deviation:
+* `eh_elf`:   2 ns
+* `vanilla`: 24 ns
+
+Average ratio: 15.7
+Ratio uncertainty: 0.8
+
+## Distibution of `unw_step` issues
+
+### `eh_elf` case
+
+* success:                              331134 (99.9%)
+* fallback to DWARF:                         2  (0.0%)
+* fallback to libunwind heuristics:          8  (0.0%)
+* fail to unwind:                          379  (0.1%)
+* total:                                331523
+
+### `vanilla` case
+
+* success:                              331136 (99.9%)
+* fallback to libunwind heuristics:          8  (0.0%)
+* fail to unwind:                          379  (0.1%)
+* total:                                331523
--- a/benching/hackbench/EVALUATION.md
+++ b/benching/hackbench/EVALUATION.md
@ -0,0 +1,48 @@
+# Hackbench - evaluation
+
+Artifacts saved in `evaluation_artifacts`.
+
+## Performance
+
+Using the command line
+
+```bash
+for i in $(seq 1 100); do
+  perf report 2>&1 >/dev/null | tail -n 1 \
+    | python to_report_fmt.py | sed 's/^.* & .* & \([0-9]*\) & .*$/\1/g'
+done
+```
+
+we save a sequence of 100 performance readings to some file.
+
+Samples:
+* `eh_elf`:  135251 unw/exec
+* `vanilla`: 138233 unw/exec
+
+Average time/unw:
+* `eh_elf`:   102 ns
+* `vanilla`: 2443 ns
+
+Standard deviation:
+* `eh_elf`:  2 ns
+* `vanilla`: 47 ns
+
+Average ratio: 24
+Ratio uncertainty: 1.0
+
+## Distibution of `unw_step` issues
+
+### `eh_elf` case
+
+* success:                              135251 (97.7%)
+* fallback to DWARF:                      1467  (1.0%)
+* fallback to libunwind heuristics:        329  (0.2%)
+* fail to unwind:                         1410  (1.0%)
+* total:                                138457
+
+### `vanilla` case
+
+* success:                              138201 (98.9%)
+* fallback to libunwind heuristics:         32  (0.0%)
+* fail to unwind:                         1411  (1.0%)
+* total:                                139644
--- a/benching/hackbench/README.md
+++ b/benching/hackbench/README.md
@ -0,0 +1,44 @@
+# Running the benchmarks
+
+Pick some name for your `eh_elfs` directory. We will call it `$EH_ELF_DIR`.
+
+## Generate the `eh_elfs`
+
+```bash
+../../generate_eh_elf.py --deps -o "$EH_ELF_DIR" \
+  --keep-holes -O2 --global-switch --enable-deref-arg hackbench
+```
+
+## Record a `perf` session
+
+```bash
+perf record --call-graph dwarf,4096 ./hackbench 10 process 100
+```
+
+You can arbitrarily increase the first number up to ~100 and the second to get
+a longer session. This will most probably take all your computer's resources
+while it is running.
+
+## Set up the environment
+
+```bash
+source ../../env/apply [vanilla | vanilla-nocache | *eh_elf] [dbg | *release]
+```
+
+The first value selects the version of libunwind you will be running, the
+second selects whether you want to run in debug or release mode (use release to
+get readings, debug to check for errors).
+
+You can reset your environment to its previous state by running `deactivate`.
+
+If you pick the `eh_elf` flavour, you will also have to
+
+```bash
+export LD_LIBRARY_PATH="$EH_ELF_DIR:$LD_LIBRARY_PATH"
+```
+
+### Actually get readings
+
+```bash
+perf report 2>&1 >/dev/null
+```
--- a/benching/hackbench/to_report_fmt.py
+++ b/benching/hackbench/to_report_fmt.py
@ -0,0 +1,21 @@
+#!/usr/bin/env python3
+
+import re
+import sys
+
+line = input()
+regex = \
+    re.compile(r'Total unwind time: ([0-9]*) s ([0-9]*) ns, ([0-9]*) calls')
+
+match = regex.match(line.strip())
+if not match:
+    print('Badly formatted line', file=sys.stderr)
+    sys.exit(1)
+
+sec = int(match.group(1))
+ns = int(match.group(2))
+calls = int(match.group(3))
+
+time = sec * 10**9 + ns
+
+print("{} & {} & {} & ??".format(calls, time, time // calls))
--- a/benching/tools/gen_perf_stats.py
+++ b/benching/tools/gen_perf_stats.py
@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+
+""" Generates performance statistics for the eh_elf vs vanilla libunwind unwinding,
+based on time series generated beforehand
+
+First run
+```bash
+for i in $(seq 1 100); do
+  perf report 2>&1 >/dev/null | tail -n 1 \
+      | python ../hackbench/to_report_fmt.py \
+          | sed 's/^.* & .* & \([0-9]*\) & .*$/\1/g'
+done > $SOME_PLACE/$FLAVOUR_times
+```
+
+for each flavour (eh_elf, vanilla)
+
+Then run this script, with `$SOME_PLACE` as argument.
+"""
+
+import numpy as np
+import sys
+import os
+
+
+def read_series(path):
+    with open(path, "r") as handle:
+        for line in handle:
+            yield int(line.strip())
+
+
+FLAVOURS = ["eh_elf", "vanilla"]
+
+path_format = os.path.join(sys.argv[1], "{}_times")
+times = {}
+avgs = {}
+std_deviations = {}
+
+for flv in FLAVOURS:
+    times[flv] = list(read_series(path_format.format(flv)))
+    avgs[flv] = sum(times[flv]) / len(times[flv])
+    std_deviations[flv] = np.sqrt(np.var(times[flv]))
+
+avg_ratio = avgs["vanilla"] / avgs["eh_elf"]
+ratio_uncertainty = (
+    1
+    / avgs["eh_elf"]
+    * (
+        std_deviations["vanilla"]
+        + avgs["vanilla"] / avgs["eh_elf"] * std_deviations["eh_elf"]
+    )
+)
+
+
+def format_flv(flv_dict, formatter):
+    out = ""
+    for flv in FLAVOURS:
+        val = flv_dict[flv]
+        out += "* {}: {}\n".format(flv, formatter.format(val))
+    return out
+
+
+print(
+    "Average time:\n{}\n"
+    "Standard deviation:\n{}\n"
+    "Average ratio: {}\n"
+    "Ratio uncertainty: {}".format(
+        format_flv(avgs, "{} ns"),
+        format_flv(std_deviations, "{}"),
+        avg_ratio,
+        ratio_uncertainty,
+    )
+)
--- a/src/DwarfReader.cpp
+++ b/src/DwarfReader.cpp
@ -9,9 +9,6 @@
 using namespace std;
 using namespace dwarf;

-typedef std::set<std::pair<int, core::FrameSection::register_def> >
-    dwarfpp_row_t;
-
 DwarfReader::DwarfReader(const string& path):
    root(fileno(ifstream(path)))
 {}
@ -30,7 +27,7 @@ static void dump_expr(const core::FrameSection::register_def& reg) {
    fprintf(stderr, "\n");
 }

-SimpleDwarf DwarfReader::read() const {
+SimpleDwarf DwarfReader::read() {
    const core::FrameSection& fs = root.get_frame_section();
    SimpleDwarf output;

@ -42,57 +39,119 @@ SimpleDwarf DwarfReader::read() const {
    return output;
 }

-SimpleDwarf::Fde DwarfReader::read_fde(const core::Fde& fde) const {
+void DwarfReader::add_cell_to_row(
+        const dwarf::core::FrameSection::register_def& reg,
+        int reg_id,
+        int ra_reg,
+        SimpleDwarf::DwRow& cur_row)
+{
+    if(reg_id == DW_FRAME_CFA_COL3) {
+        cur_row.cfa = read_register(reg);
+    }
+    else {
+        try {
+            SimpleDwarf::MachineRegister reg_type =
+                from_dwarfpp_reg(reg_id, ra_reg);
+            switch(reg_type) {
+                case SimpleDwarf::REG_RBP:
+                    cur_row.rbp = read_register(reg);
+                    break;
+                case SimpleDwarf::REG_RBX:
+                    cur_row.rbx = read_register(reg);
+                    break;
+                case SimpleDwarf::REG_RA:
+                    cur_row.ra = read_register(reg);
+                    break;
+                default:
+                    break;
+            }
+        }
+        catch(const UnsupportedRegister&) {} // Just ignore it.
+    }
+}
+
+void DwarfReader::append_row_to_fde(
+        const dwarfpp_row_t& row,
+        uintptr_t row_addr,
+        int ra_reg,
+        SimpleDwarf::Fde& output)
+{
+    SimpleDwarf::DwRow cur_row;
+
+    cur_row.ip = row_addr;
+
+    for(const auto& cell: row) {
+        add_cell_to_row(cell.second, cell.first, ra_reg, cur_row);
+    }
+
+    if(cur_row.cfa.type == SimpleDwarf::DwRegister::REG_UNDEFINED)
+    {
+        // Not set
+        throw InvalidDwarf();
+    }
+
+    output.rows.push_back(cur_row);
+}
+
+template<typename Key, typename Value>
+static std::set<std::pair<Key, Value> > map_to_setpair(
+        const std::map<Key, Value>& src_map)
+{
+    std::set<std::pair<Key, Value> > out;
+    for(const auto map_it: src_map) {
+        out.insert(map_it);
+    }
+    return out;
+}
+
+void DwarfReader::append_results_to_fde(
+        const dwarf::core::FrameSection::instrs_results& results,
+        int ra_reg,
+        SimpleDwarf::Fde& output)
+{
+    for(const auto row_pair: results.rows) {
+        append_row_to_fde(
+                row_pair.second,
+                row_pair.first.lower(),
+                ra_reg,
+                output);
+    }
+    if(results.unfinished_row.size() > 0) {
+        try {
+            append_row_to_fde(
+                    map_to_setpair(results.unfinished_row),
+                    results.unfinished_row_addr,
+                    ra_reg,
+                    output);
+        } catch(const InvalidDwarf&) {
+            // Ignore: the unfinished_row can be undefined
+        }
+    }
+}
+
+SimpleDwarf::Fde DwarfReader::read_fde(const core::Fde& fde) {
    SimpleDwarf::Fde output;
    output.fde_offset = fde.get_fde_offset();
    output.beg_ip = fde.get_low_pc();
    output.end_ip = fde.get_low_pc() + fde.get_func_length();

-    auto rows = fde.decode().rows;
    const core::Cie& cie = *fde.find_cie();
    int ra_reg = cie.get_return_address_register_rule();

-    for(const auto row_pair: rows) {
-        SimpleDwarf::DwRow cur_row;
+    // CIE rows
+    core::FrameSection cie_fs(root.get_dbg(), true);
+    auto cie_rows = cie_fs.interpret_instructions(
+            cie,
+            fde.get_low_pc(),
+            cie.get_initial_instructions(),
+            cie.get_initial_instructions_length());

-        cur_row.ip = row_pair.first.lower();
+    // FDE rows
+    auto fde_rows = fde.decode();

-        const dwarfpp_row_t& row = row_pair.second;
-
-        for(const auto& cell: row) {
-            if(cell.first == DW_FRAME_CFA_COL3) {
-                cur_row.cfa = read_register(cell.second);
-            }
-            else {
-                try {
-                    SimpleDwarf::MachineRegister reg_type =
-                        from_dwarfpp_reg(cell.first, ra_reg);
-                    switch(reg_type) {
-                        case SimpleDwarf::REG_RBP:
-                            cur_row.rbp = read_register(cell.second);
-                            break;
-                        case SimpleDwarf::REG_RBX:
-                            cur_row.rbx = read_register(cell.second);
-                            break;
-                        case SimpleDwarf::REG_RA:
-                            cur_row.ra = read_register(cell.second);
-                            break;
-                        default:
-                            break;
-                    }
-                }
-                catch(const UnsupportedRegister&) {} // Just ignore it.
-            }
-        }
-
-        if(cur_row.cfa.type == SimpleDwarf::DwRegister::REG_UNDEFINED)
-        {
-            // Not set
-            throw InvalidDwarf();
-        }
-
-        output.rows.push_back(cur_row);
-    }
+    // instrs
+    append_results_to_fde(cie_rows, ra_reg, output);
+    append_results_to_fde(fde_rows, ra_reg, output);

    return output;
 }
--- a/src/DwarfReader.hpp
+++ b/src/DwarfReader.hpp
@ -13,6 +13,9 @@

 #include "SimpleDwarf.hpp"

+typedef std::set<std::pair<int, dwarf::core::FrameSection::register_def> >
+    dwarfpp_row_t;
+
 class DwarfReader {
    public:
        class InvalidDwarf: public std::exception {};
@ -21,14 +24,31 @@ class DwarfReader {
        DwarfReader(const std::string& path);

        /** Actually read the ELF file, generating a `SimpleDwarf` output. */
-        SimpleDwarf read() const;
+        SimpleDwarf read();

    private: //meth
-        SimpleDwarf::Fde read_fde(const dwarf::core::Fde& fde) const;
+        SimpleDwarf::Fde read_fde(const dwarf::core::Fde& fde);
+
+        void append_results_to_fde(
+                const dwarf::core::FrameSection::instrs_results& results,
+                int ra_reg,
+                SimpleDwarf::Fde& output);

        SimpleDwarf::DwRegister read_register(
                const dwarf::core::FrameSection::register_def& reg) const;

+        void add_cell_to_row(
+                const dwarf::core::FrameSection::register_def& reg,
+                int reg_id,
+                int ra_reg,
+                SimpleDwarf::DwRow& cur_row);
+
+        void append_row_to_fde(
+                const dwarfpp_row_t& row,
+                uintptr_t row_addr,
+                int ra_reg,
+                SimpleDwarf::Fde& output);
+
        SimpleDwarf::MachineRegister from_dwarfpp_reg(
                int reg_id,
                int ra_reg=-1
--- a/src/Makefile
+++ b/src/Makefile
@ -14,6 +14,7 @@ OBJS=\
 	PcHoleFiller.o \
 	EmptyFdeDeleter.o \
 	ConseqEquivFilter.o \
+	OverriddenRowFilter.o \
 	SwitchStatement.o \
 	NativeSwitchCompiler.o \
 	FactoredSwitchCompiler.o \
--- a/src/OverriddenRowFilter.cpp
+++ b/src/OverriddenRowFilter.cpp
@ -0,0 +1,31 @@
+#include "OverriddenRowFilter.hpp"
+
+OverriddenRowFilter::OverriddenRowFilter(bool enable)
+    : SimpleDwarfFilter(enable)
+{}
+
+SimpleDwarf OverriddenRowFilter::do_apply(const SimpleDwarf& dw) const {
+    SimpleDwarf out;
+
+    for(const auto& fde: dw.fde_list) {
+        out.fde_list.push_back(SimpleDwarf::Fde());
+        SimpleDwarf::Fde& cur_fde = out.fde_list.back();
+        cur_fde.fde_offset = fde.fde_offset;
+        cur_fde.beg_ip = fde.beg_ip;
+        cur_fde.end_ip = fde.end_ip;
+
+        if(fde.rows.empty())
+            continue;
+
+        for(size_t pos=0; pos < fde.rows.size(); ++pos) {
+            const auto& row = fde.rows[pos];
+            if(pos == fde.rows.size() - 1
+                    || row.ip != fde.rows[pos+1].ip)
+            {
+                cur_fde.rows.push_back(row);
+            }
+        }
+    }
+
+    return out;
+}
--- a/src/OverriddenRowFilter.hpp
+++ b/src/OverriddenRowFilter.hpp
@ -0,0 +1,15 @@
+/** SimpleDwarfFilter to remove the first `n-1` rows of a block of `n`
+ * contiguous rows that have the exact same address. */
+
+#pragma once
+
+#include "SimpleDwarf.hpp"
+#include "SimpleDwarfFilter.hpp"
+
+class OverriddenRowFilter: public SimpleDwarfFilter {
+    public:
+        OverriddenRowFilter(bool enable=true);
+
+    private:
+        SimpleDwarf do_apply(const SimpleDwarf& dw) const;
+};
--- a/src/main.cpp
+++ b/src/main.cpp
@ -13,6 +13,7 @@
 #include "PcHoleFiller.hpp"
 #include "EmptyFdeDeleter.hpp"
 #include "ConseqEquivFilter.hpp"
+#include "OverriddenRowFilter.hpp"

 #include "settings.hpp"

@ -106,8 +107,9 @@ int main(int argc, char** argv) {
    SimpleDwarf filtered_dwarf =
        PcHoleFiller(!settings::keep_holes)(
        EmptyFdeDeleter()(
+        OverriddenRowFilter()(
        ConseqEquivFilter()(
-            parsed_dwarf)));
+            parsed_dwarf))));

    FactoredSwitchCompiler* sw_compiler = new FactoredSwitchCompiler(1);
    CodeGenerator code_gen(
Author	SHA1	Message	Date
Théophile Bastian	22bfb62bf3	Bench: evaluate gzip	2019-06-10 12:06:13 +02:00
Théophile Bastian	ceeec6ca5d	Benching: evaluate hackbench clearly, improve tools	2019-06-10 12:04:52 +02:00
Théophile Bastian	a0f58b592d	compiler: generate rows for CIE	2019-06-09 03:32:54 +02:00