(trying the one-pass / no editing blog post style)
A few months ago I learned that python (and other interpreted languages') RPATH lookups involve a non-trivial amount of compute. For each invocation of the python interpreter, python makes ~hundreds of serial calls to stat
and related functions before running user code (cf. ['What I've Learned About Optimizing Python'][0], [ENOENT caching in distri][1]). Each stat
system call costs some amount of time (~200ns-5us) and also causes a [context switch][2] into and out of the kernel. There are many workloads which invoke the Python interpreter 100s of times.
I was curious whether it'd be worth using io_uring for reducing startup and module importing overhead.
Wesley and I looked at this this morning - our plan was roughly: