A while ago, I had a need to monitor filesystem modifications, and I looked around for Python bindings for the Linux kernel’s inotify subsystem. At the time, the only existing library was pyinotify, so being a lazy sort, I naturally tried to use it.
On first glance, the documentation seems impressive, and the API looks reasonable. Effective use of inotify is a subtle affair, however, and pyinotify is not, shall we say, the best tool for the job. It’s difficult to tell what those problems might be from external inspection, though, so here are a few notes from my experience.
Correctness
A program using pyinotify can easily lose track of parts of its directory hierarchy. The library doesn’t raise an OSError
exception if the inotify_add_watch
system call fails: instead, it propagates the -1
error result up to the caller as a value in a dict
, but without the value of errno
to tell the caller why the error occurred.
It’s thus trivial to miss errors entirely, because the usual mechanism of raising exceptions isn’t used. Almost as bad, it’s impossible to distinguish between recoverable (tried to add a watch on a directory that no longer exists) and fatal (hit the system max_user_watches
limit) errors.
Performance
To a regular Python hacker, the interface that pyinotify provides will probably look reasonable. If you want to handle some kind of event, just write a method that will get invoked with an Event object when that event occurs. How reassuringly normal.
Under the hood, though, the implementation is terrible. On every event, the library scans every event that the inotify interface could possibly report, and checks to see if your class implements one of several possible appropriately named methods. This means it’s traversing a 20-element dict
, and performing up to 60 attribute lookups (of which up to 40 are based on %
-formatted names), for every reported event.
This has disastrous performance implications. If you write a simple monitoring tool that uses pyinotify, use it to monitor activity in a Linux kernel source tree, and then start a build in that tree, try running top
while your build runs. When I did this, I found that pyinotify was consuming an entire CPU trying to keep up with the flood of notification events.
Locking
All that needless attribute lookup churn isn’t the only problem: pyinotify uses a threading.RLock
to protect every access to every attribute of its Watch
class, by providing its own __getattribute__
and __setattr__
methods.
I can’t guess what the author thinks he’s protecting himself from, but he’s got a solid defence mounted against both correctness and performance there. (Blindly locking individual attributes isn’t going to protect the consistency of an entire data structure, and delegating responsibility for locking out to callers, who are probably all single-threaded anyway, might help to recover a bit of the execrable performance. Watch
isn’t often on the fast path, thank goodness.)
Is it possible to do better?
A potential rejoinder to my performance criticisms is that Python isn’t a fast language. However, this doesn’t bear up in general: I’ve written plenty of nippy Python code. In this particular case, in response to my mounting horror at reading and fixing the pyinotify source, I wrote bindings of my own. In contrast to pyinotify consuming an entire CPU during moderately heavy filesystem activity, an app using my bindings consumes about 5% of a CPU, even in the face intensive activities like untarring a big file archive.
In part, this is because my bindings are less abstracted than those of pyinotify. I don’t dispatch out to user methods at all; the caller is responsible for checking a bitmask instead. The readability of application code isn’t really affected by this, but stripping out all the cruft massively improves performance.
In addition, the application itself is also responsible for using the library in an informed way. To get decent performance with inotify, you must delay calls to read
so that the kernel has a chance to aggregate multiple notifications into a single buffer write. In other words, if a call to poll
says “you’ve got events”, you have to wait a good fraction of a second before seeing what they are. I provide a Threshold
class to help with this.
While it is certainly possible to call into pyinotify in a similarly informed way, I suspect that all its flab and abstraction will gull the unwary coder into thinking that maybe they’re not writing performance-critical code after all, when in fact they are.
There are other Python inotify interfaces available. One is, like mine, named python-inotify, but a quick glance at its source code revealed some of the same silliness with unnecessary locking that plagues pyinotify, so I quickly averted my eyes. There’s also a Python API to gamin. I have no opinion about it, beyond not wanting to run another daemon if I can avoid it.
My general advice would be to avoid writing code that involves monitoring filesystem activity. It’s all too easy to write code that looks sensible, but is actually racy, usually under circumstances that are difficult to reproduce. Tuning performance without introducing more races or bugs is tough. You’re getting the idea now: hard! scary! find something fun instead!
The corollary to this is, of course, that as a user, you ought to be suspicious of any programs you use that monitor filesystem activity. I bet the Beagle and Google Desktop teams have armloads of horror stories.
Hi. Thanks for your writing, especially the bit of warning about programs that monitor file system activity. Have you an advice about pynotify (Debian package python-notify)? Best, Wok
Wok: at the least, don’t write a program that uses pyinotify, and try not to use one that does. I don’t know what Debian’s policies are regarding distributing poor-quality software, so I won’t venture into any advice there.
Well, I clearly read too fast. Inotify has nothing to do with python-notify, forget my message and delete it if you want. Sorry. Regards, Wok
I’ve noticed the link to your own bindings now appear to be 404. Are they still available?
By some creative Googling, I found what seems to be the inotify bindings referred to in this blog post, here:
http://hg.kublai.com/python/inotify
I have a branch of Bryans code that I’m still using and maintaining. See https://bitbucket.org/JanKanis/python-inotify/. Bryans original code is also online at https://bitbucket.org/bos/python-inotify.
Hi Erik, I come from the future to thank you for the link.
Others of us from the future come wondering whether the criticisms here are still relevant to the crazy world we’ve built in the past six years. Has pyinotify fixed any of this? Is there a better alternative?
I second Rico’s question.
Hi Bryan, same question as Jon earlier. http://hg.serpentine.com/python-inotify/ times out and doesn’t respond. Any chance of getting it going again or pointing to the source elsewhere? Thanks, Ralph.
FWIW, just spent half a day attempting to work with pyinotify 0.9.4, and the implementation is utterly horrible. Clearly written by someone new to python but possibly from a C background. It’s totally un-pythonic and eats a LOT of memory by including everything from daemonization to threading, even if you have no use for such things. Also the performance is bad.
Indeed pyinotify has a very un-pythonic, ugly interface.
Try inotifyx.