By default Linux distros are unoptimized in terms of I/O latency. So, here are some tips to improve that.
Most apps still don't do multi-threaded I/O access, so it's a thread-per-app which makes per-app speed always bottlenecked by single-core CPU performance (that's not even accounting for stuttering on contention between multiple processes), so even with NVMe capable of 3-6 GB/s of linear read you may get only 1-2 GB/s with ideal settings and 50-150/100-400 MB/s of un/buffered random read (what apps actually use in real life) is the best you can hope for.
All writes are heavily buffered on 3 layers (OS' RAM cache, device's RAM cache, device's SLC-like on-NAND cache), so it's difficult to get real or stable numbers but writes are largelly irrelevant for system's responsiveness, so they may be sacrificed for better random reads.
The performance can be checked by:
- `fio --name=read --readonly --rw={read/randread} --ioengine=libaio --iodepth={jobs_per_each_worker's_command} --bs={4k/2M} --direct={0/1} --num