While investigating neondatabase/neon#11446 I learned a lot about the exact conditions in which io_uring punts to async workers (io-wq).
Specifically, I was surprised that, on Debian Bookworm 6.1.0-32-amd64, all Direct IO writes on ext4 would get punted to async workers.
Even fallocateing the space upfront didn't help.
I wrote a reproducer app (see appendix) and used bpftrace + light kernel patching to triangulate in which cases we punt, and why.
The gist is: on mainline kernel 6.12.25, fallocate() before you write, and it should work.