Add tentative 50_systematic_evaluation (mainly benchsuite-bb)

2023-09-06 17:41:57 +02:00 · 2023-09-06 17:41:57 +02:00 · f3b6936736
commit f3b6936736
parent 72c289239d
1 changed files with 71 additions and 0 deletions
--- a/plan/50_systematic_evaluation.md
+++ b/plan/50_systematic_evaluation.md
@ -0,0 +1,71 @@
+# A more systematic approach to throughput prediction performance analysis
+
+* So far, evaluation only on lone basic blocks.
+* Extracted with somewhat automated methods, somewhat reproducible with manual
+  effort.
+* Problematic when changing ISA: the same bench suite must be re-compiled… and
+  re-extracted.
+
+## Benchsuite-bb
+
+* Fully automated, cross-platform (weighted) BB extraction, based on bench
+  suites
+* Extract relevant basic blocks from real workloads: most weighted BBs are
+  often executed
+
+[big picture:
+Benchsuite
+-> knobs [size, cflags, …]
+-> compile
+-> run with perf
+-> extract BBs
+]
+
+### Benchsuites
+
+* Polybench: described earlier
+* Spec: described earlier
+* Rodinia: bench suite for heterogeneous computing
+    * Targetting GPU, OpenMP
+        * Used in OpenMP mode
+    * Exhibits various usual kernels (K-means, backprop, BFS, …)
+* Lot of code and tooling to write to "standardize" the interfaces and bring
+  them into a single tool
+
+### Perf analysis
+
+* Perf profiler: works by sampling PC (+ stack) either on event occurrences, or
+  a given number of times per second. 2nd mode used.
+* Extract PC for each sample
+
+### Extract BBs
+
+* For each sampled PC,
+    * Find corresponding binary symbol
+    * Break this symbol into basic blocks using Capstone
+        * Break at control flow instructions
+        * Break at jump sites
+    * Cache BBs from this symbol
+    * Map this PC to its corresponding BB
+* Extract weighted BBs
+
+This way, chunk only the relevant portions
+
+### Conclusion
+
+* Tooling to extract BBs from several benchmark suites
+* On any architecture supported by the suite
+* Weighted by measured occurrences on actual runs
+
+* Works well to evaluate tools such as Palmed: kernels are multisets, no
+  dependencies, everything is L1-resident.
+* We can use Pipedream as a baseline measurement.
+
+* What if we want an execution of the real kernel as baseline (not Pipedream)?
+* The extracted BB cannot be measured as-is: lacks context.
+* BHive
+…transition to
+
+## CesASMe
+
+[paper with edits]