Spy on Python down to the Linux kernel level

4 min read
Article URL: https://p403n1x87.github.io/spy-on-python-down-to-the-linux-kernel-level.html Comments URL: https://news.ycombinator.com/item?id=28669256 Points: 1 # Comments: 0
When I conceived the design of Austin for the first time, I've sworn to always adhere to two guiding principles:

no dependencies other than the standard C library (and whatever system calls the OS provides);

minimal impact on the tracee, even under high sampling frequency.

Let me elaborate on why I decided to stick to these two rules. The first one is more of a choice of simplicity. The power horse of Austin is the capability of reading the private memory of any process, be it a child process or not. Many platforms provide the API or system calls to do that, some with more security gotchas than others. Once Austin has access to that information, the rest is plain C code that makes sense of that data and provides a meaningful representation to the user by merely calling libc 's fprintf on a loop.

The second guiding principle is what everybody desires from observability tools. We want to be able to extract as much information as possible from a running program, perturbing it as little as possible as to avoid skewed data. Austin can make this guarantee because reading VM memory does not require the tracee to be halted. Furthermore, the fact that Python has a GIL implies that a simple Python application will run on at most one physical core. To be more precise, a normal, pure-Python application would not spend more CPU time than wall-clock time. Therefore, on machines with multiple cores, even if Austin ends up acting like a busy loop at high sampling frequencies and hogging a physical core, there would still be plenty of other cores to run the Python application unperturbed and unaware that is being spied on. Even for multiprocess applications, the expected impact is minimal, for if you are running, say, a uWSGI server on a 64-core machine, you wouldn't lose…
Read full article