Back in 2005, Andy Wingo wrote a neat little statistical profiler named statprof
that promptly disappeared into obscurity. It has since languished almost unknown, with a handful of people writing semi-private forks that themselves seem to be dead.
Statistical profiling (also known as sampling profiling) is simple and sweet: the profiler periodically wakes up and samples the stack, then when all is done, it prints a simple report of which lines showed up most often in the profile.
Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.
A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.
After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:
% cumulative self time seconds seconds name 68.75 0.14 0.14 scmutil.py:546:revrange 6.25 0.01 0.01 cmdutil.py:1006:walkchangerevs 6.25 0.01 0.01 revlog.py:241:__init__ [...blah blah blah...] 0.00 0.01 0.00 util.py:237:__get__ --- Sample count: 16 Total time: 0.200000 seconds
I have uploaded statprof to the Python package index, so it’s almost trivial to install: “easy_install statprof
” and you’re up and running.
Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!
Yay! This sounds like a promising addition to the toolkit.
By the way, there is also http://packages.python.org/line_profiler, which is a deterministic (100% samples) per-line profiler.
I’d tried line_profiler but quickly gave up, IIRC because I couldn’t get it to even compile, never mind work.
That simple?
$ sudo easy_install statprof
install_dir /usr/local/lib/python2.6/dist-packages/
Searching for statprof
Reading http://pypi.python.org/simple/statprof/
Reading http://packages.python.org/statprof
No local packages or download links found for statprof
error: Could not find suitable distribution for Requirement.parse(‘statprof’)
Could you try the download again, please?
Bryan, could you detail what did not work when building line_profiler ? It is used quite often in the scipy community, and should definitely work.
David:
I bet Bryan did the same thing I did:
$ hg clone https://bitbucket.org/robertkern/line_profiler
$ cd line_profiler
$ python setup.py build
$ python setup.py build
Could not import Cython. Using pre-generated C file if available.
running build
running build_py
warning: build_py: byte-compiling is disabled, skipping.
running build_ext
building ‘_line_profiler’ extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c _line_profiler.c -o build/temp.linux-x86_64-2.7/_line_profiler.o
gcc: error: _line_profiler.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command ‘gcc’ failed with exit status 4
Oh, reading the error message more closely, I see that I need to have Cython installed for this to work…