Pmctools: Research
This page lists PMC related research that could be done.
A GUI Visualizer
This tool helps a user to visually 'see' hotspots in the code and correlate these to source code. The tool will need to:
- Attach process-mode PMCs to selected interrupt threads, or processes and their descendants.
- Sample the whole system or specific threads/processes.
- Show the progression of 'hot-spots' in a process or in the kernel over time (i.e., a 3-d graph).
- Show cache miss / tlb miss / instruction retiral / CPI data of a process over time (i.e., a 2-d graph). Aggregate this data for all processes corresponding to a given executable.
- Show source code corresponding to selected hot-spots in a pane (or show a disassembly listing if we can't locate sources).
- Profile a remote target system (needs some kind of controlling daemon on the target).
The 'research' component here lies in finding neat ways of displaying collected performance data to the user.
Related Reading
Magic Ink Information Software and the Graphical Interface, by Bret Victor.
Performance speed ups
Use the data in hwpmc(4) logs to intelligently reorder executables. For example, we could:
- Group 'hot' functions into contiguous virtual addresses to minimize the usage of physical memory.
- Order functions to improve startup time for executables.
We could have 'soft' PMCs (see PmcTools/PmcKinds) that 'sample' on page faults, in addition to samples from true PMCs.
Drive 'cc -pg' profiles from PMC hardware
This is an extension of existing profiling infrastructure. In it we integrate conventional profiling (timer based profiling) with a 'clock' derived from the PMC sampling interrupt. We'd change monitor() to allocate a PMC if a specific environment variable is present.
PMCs and Scheduling
Use PMC-based measurements to drive scheduling decisions. For example, measuring cache activity at the end of a CPU slice could help us measure how 'cold' the cache has become w.r.t., other processes.
PMC driven static analysis of code
Use static analysis and data from CPU cache behaviour to identify areas of code where we could reduce cache thrashing and improve cache locality.