Profiling on Modern CPUs
Profiling isn't just "count instructions, look for hotspots" anymore. Between the superscalar-ness, the multi-core-ness, the shared-cache-ness, the shared-bus-ness, the hierarchical memory-ness and a lot of other -ness'es, things get a little.. special.
This is (for now) a reading list and set of notes of what I've come across whilst exploring what's going on with modern intel hardware.
notes
There are a few things to try and make .. easier to do in PMC:
- A system overview (eg GNU/perf 'perf stat') that gives simple overviews for things like general instruction counts and efficiencies, cache thrashing, resource stalls, etc
- Make it much easier to establish where certain classes of bottlenecks are when profiling (again - bus / cache bottlenecks, cache thrashing, resource stalls)
- A much nicer way of summarising / analysing counters - right now it's per-CPU and on larger CPU machines this gets very unwieldy very quickly
- A machine-summarised version of the counter output (with timestamping) so it can be fed into external processing scripts
reading list
http://software.intel.com/sites/products/collateral/hpc/vtune/cycle_accounting_analysis.pdf
http://oprofile.sourceforge.net/docs/intel-sandybridge-events.php
https://perf.wiki.kernel.org/index.php/Tutorial#Counting_with_perf_stat
http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
http://software.intel.com/sites/default/files/m/a/d/2/2/e/15529-Intel_VTune_Using.pdf