Add tentative 50_systematic_evaluation (mainly benchsuite-bb)
This commit is contained in:
parent
72c289239d
commit
f3b6936736
1 changed files with 71 additions and 0 deletions
71
plan/50_systematic_evaluation.md
Normal file
71
plan/50_systematic_evaluation.md
Normal file
|
@ -0,0 +1,71 @@
|
|||
# A more systematic approach to throughput prediction performance analysis
|
||||
|
||||
* So far, evaluation only on lone basic blocks.
|
||||
* Extracted with somewhat automated methods, somewhat reproducible with manual
|
||||
effort.
|
||||
* Problematic when changing ISA: the same bench suite must be re-compiled… and
|
||||
re-extracted.
|
||||
|
||||
## Benchsuite-bb
|
||||
|
||||
* Fully automated, cross-platform (weighted) BB extraction, based on bench
|
||||
suites
|
||||
* Extract relevant basic blocks from real workloads: most weighted BBs are
|
||||
often executed
|
||||
|
||||
[big picture:
|
||||
Benchsuite
|
||||
-> knobs [size, cflags, …]
|
||||
-> compile
|
||||
-> run with perf
|
||||
-> extract BBs
|
||||
]
|
||||
|
||||
### Benchsuites
|
||||
|
||||
* Polybench: described earlier
|
||||
* Spec: described earlier
|
||||
* Rodinia: bench suite for heterogeneous computing
|
||||
* Targetting GPU, OpenMP
|
||||
* Used in OpenMP mode
|
||||
* Exhibits various usual kernels (K-means, backprop, BFS, …)
|
||||
* Lot of code and tooling to write to "standardize" the interfaces and bring
|
||||
them into a single tool
|
||||
|
||||
### Perf analysis
|
||||
|
||||
* Perf profiler: works by sampling PC (+ stack) either on event occurrences, or
|
||||
a given number of times per second. 2nd mode used.
|
||||
* Extract PC for each sample
|
||||
|
||||
### Extract BBs
|
||||
|
||||
* For each sampled PC,
|
||||
* Find corresponding binary symbol
|
||||
* Break this symbol into basic blocks using Capstone
|
||||
* Break at control flow instructions
|
||||
* Break at jump sites
|
||||
* Cache BBs from this symbol
|
||||
* Map this PC to its corresponding BB
|
||||
* Extract weighted BBs
|
||||
|
||||
This way, chunk only the relevant portions
|
||||
|
||||
### Conclusion
|
||||
|
||||
* Tooling to extract BBs from several benchmark suites
|
||||
* On any architecture supported by the suite
|
||||
* Weighted by measured occurrences on actual runs
|
||||
|
||||
* Works well to evaluate tools such as Palmed: kernels are multisets, no
|
||||
dependencies, everything is L1-resident.
|
||||
* We can use Pipedream as a baseline measurement.
|
||||
|
||||
* What if we want an execution of the real kernel as baseline (not Pipedream)?
|
||||
* The extracted BB cannot be measured as-is: lacks context.
|
||||
* BHive
|
||||
…transition to
|
||||
|
||||
## CesASMe
|
||||
|
||||
[paper with edits]
|
Loading…
Reference in a new issue