Documenting Crappy Performance
I added a simple, but powerful instrumentation framework to GEGL this weekend.
The original plan was to use SVG, xhtml+css or cairo for creating nice graphs
and diagrams, but I settled on making UTF8art bar-diagrams, using the glyphs with incremental horizontal coverage in 1/8 increments instead.
./unsharp-mask.xml usecs: 6256331
gegl 7.693684 ██████████████████████████████████████████
process 6.345501 ██████████████████████████████████▊
gaussian-blur 3.025135 ████████████████▋
babl 2.601660 ██████████████▏
defined-region 0.000009
prepare 0.000005
other 0.423461 ██▍
png-save 1.130282 ██████▏
babl 0.444080 ██▌
defined-region 0.000005
prepare 0.000004
other 0.686193 ███▊
subtract 0.994492 █████▌
babl 0.927667 █████
defined-region 0.000006
prepare 0.000004
other 0.066815 ▍
add 0.731857 ████
babl 0.626452 ███▌
Other profiling tools like valgrind's Callgrind and KCacheGrind provides even more detailed information about where time is spent, but a condensed and domain specific visualization makes it easier to see what the hot-spots are.
The core of the instrumentation architecture is the following functions:
glong gegl_ticks (void);
void gegl_instrument (const gchar *parent_task,
const gchar *sub_task,
glong usecs);
gchar * gegl_instrument_utf8 (void); /*header/source*/
The first function is a clock providing usecs (1/1000000 second), which can
be used to measure the execution time of code
The second function registers a named sub-task of an already existing task. (Tasks can be registered by calling gegl_instrument with 0 as the usecs parameter, the name of the root is specified with an initial gegl_instrument call where both parent_task, and sub_task have the same value.
The last function return a UTF8 breakdown of the time usage recorded.
Each task should register it's own time, a special child is appended in the reports signifying how much time has not been accounted for by it's siblings (for the operations in GEGL this is the time spent doing raster manipulation, globally for GEGL this value is slightly wrong (even negative), since recursive invokations of GEGL during processing of the load operations leads to double registration of some processing time.)
This instrumentation is now automatically built for the gallery of the
web-pages (which are generated in the docs subdir when GEGL is compiled). I hope to get this hooked up with the Automatic regression tests of GEGL at OSDL. To provide insights
about the performance on various architectures, as well plotting historic data
to be able to track performance changes over time.
Syndicated 2006-09-18 12:36:06 from codecave.org