To blizzard:
If you're going to improve gmon/mcount, please teach it that
if there's an existing gmon.out in the working directory,
then it should augment that file instead of clobbering it.
That way, if you want to profile a program that runs for a
short time, you could just run it a few thousand times in a
shell loop. Right now you have to do that, plus rename the
reports so they all get saved, and then crunch them together
at the end. This takes much longer than it has to, and
throws your results off because disk cache is wasted on the
huge gmon.out files which all have to stay around until the
end.
To make this change safely, you should probably save the
identity of the executable in gmon.out, and start over if it
changes. (This should be done anyway.)
I'd also like to see better kernelside support for
profiling. setitimer(2) has a lot of overhead, and
the ticks don't come nearly often enough. SVR4 has a
profil(2) system call that pushes the histogram
updates into the kernel, which gets rid of the overhead but
doesn't help with the granularity. Also, I don't think it
can handle gaps in the region to be profiled, so your
program has to be statically linked.
I'd rather not add system calls. Instead, I envision a
pseudo-device which you map several different times,
specifying the window of the address space to profile. It
can use the high-resolution timer in the RTC to get ticks
more often than the normal timer interrupt. Updates happen
in the driver, so no more 30% of execution time spent in
__mcount_internal.
GCC/i386 has a stupid bug where it clobbers %edx
on every function entry, when compiling with profiling.
This breaks -mregparm. Okay, that doesn't affect
very many people - it still needs to get fixed.