Skip to content

Instantly share code, notes, and snippets.

@osalbahr
Last active March 4, 2024 09:11
Show Gist options
  • Save osalbahr/18e4e3d0965e510177ed4083ab2ca82c to your computer and use it in GitHub Desktop.
Save osalbahr/18e4e3d0965e510177ed4083ab2ca82c to your computer and use it in GitHub Desktop.
Experimenting with `fio` - Flexible I/O tester on NFS vs fsdax

See https://github.com/osalbahr/pmem-redirection


I experimented with fio, a well-known tool for I/O benchmarking. Here are some simple benchmarks with the following target files:

  1. $HOME/fio.txt - Network Filesystem (NFS)
  2. /mnt/fsdax/$USER/fio.txt - Persistent Memory (PMEM)
  3. /tmp/$USER/fio.txt - SSD (note: I called the job in the jobfile "ram-tmp" incorrectly, it should be "ssd-tmp")
[osalbahr@c63 fio]$ head *fio
==> nfs-home.fio <==
[nfs-home]
rw=randrw
size=${SIZE}
filename=${HOME}/fio.txt

==> pmem-fsdax.fio <==
[pmem-fsdax]
rw=randrw
size=${SIZE}
filename=/mnt/fsdax/${USER}/fio.txt

==> ram-tmp.fio <==
[ram-tmp]
rw=randrw
size=${SIZE}
filename=/tmp/${USER}/fio.txt

Here is the interesting part after running the default random read/write benchmarks with 1GiB of memory (full log below):

[osalbahr@c63 fio]$ SIZE=1GiB fio *.fio
nfs-home: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
pmem-fsdax: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
ram-tmp: (g=2): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
Run status group 0 (all jobs):
   READ: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=477MiB (500MB), run=39364-39364msec
  WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=477MiB (500MB), run=39364-39364msec

Run status group 1 (all jobs):
   READ: bw=491MiB/s (515MB/s), 491MiB/s-491MiB/s (515MB/s-515MB/s), io=476MiB (499MB), run=969-969msec
  WRITE: bw=493MiB/s (517MB/s), 493MiB/s-493MiB/s (517MB/s-517MB/s), io=478MiB (501MB), run=969-969msec

Run status group 2 (all jobs):
   READ: bw=25.1MiB/s (26.3MB/s), 25.1MiB/s-25.1MiB/s (26.3MB/s-26.3MB/s), io=477MiB (500MB), run=19036-19036msec
  WRITE: bw=25.0MiB/s (26.2MB/s), 25.0MiB/s-25.0MiB/s (26.2MB/s-26.2MB/s), io=477MiB (500MB), run=19036-19036msec
...

It appears that random reads/writes to /tmp are twice as fast as $HOME. I expected the difference to be more drastic, but it's still faster as expected. What's surprising, though, is that /mnt/fsdax was about 20x faster than /tmp! Of course, using other fio options and maybe other benchmarking software suites would be a good idea before drawing conclusions. Still, the difference is interesting.

So I used fastfetch to query the available Memory and Disk. It is able to detect available storage mounted to both / and /mnt/fsdax.

Monitoring the used Memory:

[osalbahr@c63 osalbahr]$ pwd
/tmp/osalbahr
[osalbahr@c63 osalbahr]$ du -h
17G	.
[osalbahr@c63 osalbahr]$ fastfetch | grep Mem
Memory: 8.72 GiB / 746 GiB (1%)
[osalbahr@c63 osalbahr]$ wc -c *
 8589934592 bigger.txt
 4294967296 big.txt
 1073741824 fio.txt
 1073741824 fio.txt-2
 1073741824 fio.txt-2-4
 1073741824 fio.txt-4
17179869184 total
[osalbahr@c63 osalbahr]$ du -h *
8.1G	bigger.txt
4.1G	big.txt
1.1G	fio.txt
1.1G	fio.txt-2
1.1G	fio.txt-2-4
1.1G	fio.txt-4
[osalbahr@c63 osalbahr]$ cat * > huge.txt && fastfetch | grep Mem
Memory: 8.72 GiB / 746 GiB (1%)
[osalbahr@c63 osalbahr]$ du -h
33G	.
[osalbahr@c63 osalbahr]$ fastfetch | grep Mem
Memory: 8.72 GiB / 746 GiB (1%)

As expected, writing to /tmp does not affect the system's Memory, but the Disk:

[osalbahr@c63 ~]$ cd /tmp/osalbahr/
[osalbahr@c63 osalbahr]$ du -h *
8.1G	bigger.txt
4.1G	big.txt
1.1G	fio.txt
1.1G	fio.txt-2
1.1G	fio.txt-2-4
1.1G	fio.txt-4
17G	huge.txt
[osalbahr@c63 osalbahr]$ fastfetch | grep Disk
Disk (/): 119 GiB / 883 GiB (13%)
Disk (/mnt/fsdax): 13.69 GiB / 248 GiB (5%)
[osalbahr@c63 osalbahr]$ rm big.txt 
[osalbahr@c63 osalbahr]$ fastfetch | grep Disk
Disk (/): 115 GiB / 883 GiB (12%)
Disk (/mnt/fsdax): 13.69 GiB / 248 GiB (5%)
[osalbahr@c63 osalbahr]$ rm bigger.txt 
[osalbahr@c63 osalbahr]$ fastfetch | grep Disk
Disk (/): 107 GiB / 883 GiB (12%)
Disk (/mnt/fsdax): 13.69 GiB / 248 GiB (5%)
[osalbahr@c63 osalbahr]$ rm *
[osalbahr@c63 osalbahr]$ fastfetch | grep Disk
Disk (/): 86.54 GiB / 883 GiB (9%)
Disk (/mnt/fsdax): 13.69 GiB / 248 GiB (5%)
[osalbahr@c63 osalbahr]$ rm -v /mnt/fsdax/*
rm: cannot remove ‘/mnt/fsdax/fmuelle’: Is a directory
removed ‘/mnt/fsdax/hyperfine’
rm: cannot remove ‘/mnt/fsdax/lost+found’: Is a directory
rm: cannot remove ‘/mnt/fsdax/osalbahr’: Is a directory
[osalbahr@c63 osalbahr]$ rm -v /mnt/fsdax/osalbahr/*
removed ‘/mnt/fsdax/osalbahr/fio.txt’
removed ‘/mnt/fsdax/osalbahr/hyperfine.txt’
[osalbahr@c63 osalbahr]$ fastfetch | grep Disk
Disk (/): 86.54 GiB / 883 GiB (9%)
Disk (/mnt/fsdax): 12.67 GiB / 248 GiB (5%)

More system information:

[osalbahr@c63 osalbahr]$ fastfetch
                 ..                     [email protected]
               .PLTJ.                   ------------------------
              <><><><>                  OS: CentOS Linux 7 x86_64
     KKSSV' 4KKK LJ KKKL.'VSSKK         Host: SYS-1029U-TRT (0123456789)
     KKV' 4KKKKK LJ KKKKAL 'VKK         Kernel: 5.4.65-200.el7.x86_64
     V' ' 'VKKKK LJ KKKKV' ' 'V         Uptime: 36 days, 4 hours, 32 mins
     .4MA.' 'VKK LJ KKV' '.4Mb.         Shell: bash 4.2.46
   . KKKKKA.' 'V LJ V' '.4KKKKK .       Cursor: Adwaita
 .4D KKKKKKKA.'' LJ ''.4KKKKKKK FA.     Terminal: slurmstepd: [310409.0]
<QDD ++++++++++++  ++++++++++++ GFD>    CPU: Intel(R) Xeon(R) Silver 4215 (16) @ 2.501 GHz
 'VD KKKKKKKK'.. LJ ..'KKKKKKKK FV      Memory: 8.72 GiB / 746 GiB (1%)
   ' VKKKKK'. .4 LJ K. .'KKKKKV '       Disk (/): 86.54 GiB / 883 GiB (9%)
      'VK'. .4KK LJ KKA. .'KV'          Disk (/mnt/fsdax): 12.67 GiB / 248 GiB (5%)
     A. . .4KKKK LJ KKKKA. . .4         Locale: en_US.UTF-8
     KKA. 'KKKKK LJ KKKKK' .4KK    
     KKSSA. VKKK LJ KKKV .4SSKK         ████████████████████████
              <><><><>                  ████████████████████████
               'MKKM'    
                 ''    

Notes:

If you would like to replicate the experiment, you'd probably want to brew install fio as Homebrew seemed to be the easiest way to install the latest version of fio (3.35) on CentOS 7. I had to install some of the packages in an Ubuntu 22.04 LTS KVM (special thanks to VCL for the extended reservations) and then rsync -au the files over as CentOS 7 does not have all that's required by Homebrew to build from source as needed (only the non-relocatable packages).

[osalbahr@c63 ~]$ brew install fio
Warning: fio 3.35 is already installed and up-to-date.
To reinstall 3.35, run:
  brew reinstall fio
[osalbahr@c63 ~]$ brew -v
Homebrew 4.1.3-41-ga2b4944
Homebrew/homebrew-core (git revision 51ef6bbadde; last commit 2023-08-03)
[osalbahr@c63 ~]$ brew --prefix
/home/osalbahr/homebrew

Of course, I could take advantage of Homebrew's bottles (precompiled binaries), but I do not have access to a writable /home/linuxbrew/.linuxbrew.

This was done on ARC: A Root Cluster for Research into Scalable Computer Systems.


Full log:

[osalbahr@c63 fio]$ SIZE=1GiB fio *.fio
nfs-home: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
pmem-fsdax: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
ram-tmp: (g=2): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.35
Starting 3 processes
Jobs: 1 (f=1): [_(2),m(1)][100.0%][r=26.4MiB/s,w=26.6MiB/s][r=6755,w=6814 IOPS][eta 00m:00s]         
nfs-home: (groupid=0, jobs=1): err= 0: pid=7530: Thu Aug  3 18:57:21 2023
  read: IOPS=3102, BW=12.1MiB/s (12.7MB/s)(477MiB/39364msec)
    clat (usec): min=181, max=91686, avg=312.34, stdev=1434.06
     lat (usec): min=181, max=91687, avg=312.43, stdev=1434.07
    clat percentiles (usec):
     |  1.00th=[  210],  5.00th=[  221], 10.00th=[  229], 20.00th=[  237],
     | 30.00th=[  247], 40.00th=[  255], 50.00th=[  265], 60.00th=[  273],
     | 70.00th=[  277], 80.00th=[  289], 90.00th=[  314], 95.00th=[  330],
     | 99.00th=[  367], 99.50th=[  383], 99.90th=[19006], 99.95th=[42730],
     | 99.99th=[59507]
   bw (  KiB/s): min=   72, max=15448, per=100.00%, avg=12475.10, stdev=4725.13, samples=78
   iops        : min=   18, max= 3862, avg=3118.76, stdev=1181.28, samples=78
  write: IOPS=3099, BW=12.1MiB/s (12.7MB/s)(477MiB/39364msec); 0 zone resets
    clat (nsec): min=1750, max=430904, avg=5518.39, stdev=2261.81
     lat (nsec): min=1788, max=443383, avg=5627.33, stdev=2302.99
    clat percentiles (nsec):
     |  1.00th=[ 2480],  5.00th=[ 2928], 10.00th=[ 3664], 20.00th=[ 4080],
     | 30.00th=[ 4384], 40.00th=[ 4896], 50.00th=[ 5408], 60.00th=[ 5792],
     | 70.00th=[ 6112], 80.00th=[ 6688], 90.00th=[ 7648], 95.00th=[ 8384],
     | 99.00th=[11456], 99.50th=[12992], 99.90th=[18816], 99.95th=[20608],
     | 99.99th=[26240]
   bw (  KiB/s): min=   56, max=15992, per=100.00%, avg=12461.91, stdev=4722.83, samples=78
   iops        : min=   14, max= 3998, avg=3115.45, stdev=1180.70, samples=78
  lat (usec)   : 2=0.04%, 4=8.64%, 10=40.28%, 20=0.99%, 50=0.03%
  lat (usec)   : 100=0.01%, 250=16.90%, 500=33.00%, 750=0.06%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03%
  lat (msec)   : 100=0.02%
  cpu          : usr=0.86%, sys=8.06%, ctx=122300, majf=0, minf=25
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=122122,122018,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
pmem-fsdax: (groupid=1, jobs=1): err= 0: pid=7611: Thu Aug  3 18:57:21 2023
  read: IOPS=126k, BW=491MiB/s (515MB/s)(476MiB/969msec)
    clat (nsec): min=2280, max=120716, avg=3646.58, stdev=790.55
     lat (nsec): min=2304, max=120740, avg=3670.22, stdev=791.16
    clat percentiles (nsec):
     |  1.00th=[ 2768],  5.00th=[ 2960], 10.00th=[ 3056], 20.00th=[ 3216],
     | 30.00th=[ 3312], 40.00th=[ 3408], 50.00th=[ 3504], 60.00th=[ 3632],
     | 70.00th=[ 3760], 80.00th=[ 3952], 90.00th=[ 4256], 95.00th=[ 4576],
     | 99.00th=[ 6560], 99.50th=[ 7392], 99.90th=[ 9024], 99.95th=[12224],
     | 99.99th=[18560]
   bw (  KiB/s): min=394984, max=394984, per=78.57%, avg=394984.00, stdev= 0.00, samples=1
   iops        : min=98746, max=98746, avg=98746.00, stdev= 0.00, samples=1
  write: IOPS=126k, BW=493MiB/s (517MB/s)(478MiB/969msec); 0 zone resets
    clat (nsec): min=1448, max=21249, avg=2163.45, stdev=702.84
     lat (nsec): min=1481, max=21286, avg=2197.18, stdev=704.09
    clat percentiles (nsec):
     |  1.00th=[ 1576],  5.00th=[ 1656], 10.00th=[ 1704], 20.00th=[ 1784],
     | 30.00th=[ 1848], 40.00th=[ 1896], 50.00th=[ 1960], 60.00th=[ 2024],
     | 70.00th=[ 2128], 80.00th=[ 2288], 90.00th=[ 2768], 95.00th=[ 3600],
     | 99.00th=[ 5088], 99.50th=[ 5600], 99.90th=[ 6880], 99.95th=[ 7648],
     | 99.99th=[12992]
   bw (  KiB/s): min=397384, max=397384, per=78.67%, avg=397384.00, stdev= 0.00, samples=1
   iops        : min=99346, max=99346, avg=99346.00, stdev= 0.00, samples=1
  lat (usec)   : 2=28.17%, 4=61.00%, 10=10.78%, 20=0.05%, 50=0.01%
  lat (usec)   : 250=0.01%
  cpu          : usr=9.61%, sys=89.98%, ctx=107, majf=0, minf=8
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=121777,122363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
ram-tmp: (groupid=2, jobs=1): err= 0: pid=7613: Thu Aug  3 18:57:21 2023
  read: IOPS=6416, BW=25.1MiB/s (26.3MB/s)(477MiB/19036msec)
    clat (usec): min=53, max=8215, avg=147.56, stdev=106.28
     lat (usec): min=53, max=8215, avg=147.66, stdev=106.28
    clat percentiles (usec):
     |  1.00th=[   85],  5.00th=[   92], 10.00th=[  100], 20.00th=[  108],
     | 30.00th=[  122], 40.00th=[  133], 50.00th=[  135], 60.00th=[  143],
     | 70.00th=[  147], 80.00th=[  161], 90.00th=[  198], 95.00th=[  225],
     | 99.00th=[  310], 99.50th=[  553], 99.90th=[ 2040], 99.95th=[ 2311],
     | 99.99th=[ 2671]
   bw (  KiB/s): min= 5608, max=28464, per=100.00%, avg=25691.79, stdev=5116.67, samples=38
   iops        : min= 1402, max= 7116, avg=6422.95, stdev=1279.17, samples=38
  write: IOPS=6408, BW=25.0MiB/s (26.2MB/s)(477MiB/19036msec); 0 zone resets
    clat (nsec): min=1498, max=73851, avg=5109.28, stdev=2688.53
     lat (nsec): min=1531, max=73967, avg=5229.55, stdev=2772.86
    clat percentiles (nsec):
     |  1.00th=[ 1640],  5.00th=[ 1784], 10.00th=[ 1944], 20.00th=[ 2320],
     | 30.00th=[ 4320], 40.00th=[ 4576], 50.00th=[ 4832], 60.00th=[ 5088],
     | 70.00th=[ 5472], 80.00th=[ 6944], 90.00th=[ 8032], 95.00th=[ 8640],
     | 99.00th=[17024], 99.50th=[18048], 99.90th=[19840], 99.95th=[21120],
     | 99.99th=[26240]
   bw (  KiB/s): min= 5672, max=28392, per=100.00%, avg=25665.89, stdev=5061.62, samples=38
   iops        : min= 1418, max= 7098, avg=6416.47, stdev=1265.40, samples=38
  lat (usec)   : 2=5.92%, 4=6.19%, 10=36.10%, 20=1.71%, 50=0.04%
  lat (usec)   : 100=5.30%, 250=43.90%, 500=0.56%, 750=0.11%, 1000=0.04%
  lat (msec)   : 2=0.08%, 4=0.05%, 10=0.01%
  cpu          : usr=1.41%, sys=8.63%, ctx=122179, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=122144,121996,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=477MiB (500MB), run=39364-39364msec
  WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=477MiB (500MB), run=39364-39364msec

Run status group 1 (all jobs):
   READ: bw=491MiB/s (515MB/s), 491MiB/s-491MiB/s (515MB/s-515MB/s), io=476MiB (499MB), run=969-969msec
  WRITE: bw=493MiB/s (517MB/s), 493MiB/s-493MiB/s (517MB/s-517MB/s), io=478MiB (501MB), run=969-969msec

Run status group 2 (all jobs):
   READ: bw=25.1MiB/s (26.3MB/s), 25.1MiB/s-25.1MiB/s (26.3MB/s-26.3MB/s), io=477MiB (500MB), run=19036-19036msec
  WRITE: bw=25.0MiB/s (26.2MB/s), 25.0MiB/s-25.0MiB/s (26.2MB/s-26.2MB/s), io=477MiB (500MB), run=19036-19036msec

Disk stats (read/write):
  pmem1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  sda: ios=120597/37203, merge=0/3, ticks=17019/5043, in_queue=471, util=98.49%
#!/usr/bin/env bash
set -ex
d1="$HOME/homebrew/"
d2='arc.csc.ncsu.edu:homebrew/'
time rsync -au --info=progress2 "$d1" "$d2"
time rsync -au --info=progress2 "$d2" "$d1"
@tddschn
Copy link

tddschn commented Mar 4, 2024

Hi Osama, it's Teddy from arcuser Google Group. You told me that ~ is slow, so why put homebrew dir there? Would beegfs be better?

Do you use sync.sh to sync local linuxbrew installation to arc, to avoid the slow build times?

@osalbahr
Copy link
Author

osalbahr commented Mar 4, 2024

Hi @tddschn!

so why put homebrew dir there? Would beegfs be better?

I didn't know about beegfs at that time. Now I have homebrew/ in both locations, though I rarely use it. Also, beegfs is "Not protected by RAID, not backed up!".

Do you use sync.sh to sync local linuxbrew installation to arc, to avoid the slow build times?

It was actually a hack around using an unsupported prefix. For example on CentOS 7 gcc was too old and would fail to build xz from source. I don't know if "build elsewhere then sync" saves any time.

I haven't tested using Homebrew on Rocky Linux 8. If it does fail (due to using an unsupported prefix), Spack is a better option as it's designed with rootless accounts on HPC systems in mind (I used Spack to install gef because gdb was too old on CentOS 7). Beware that Spack installs everything from source.

After being requested, fio and hyperfine are now installed system-level anyways, which is much better. I'd consider going that route unless you're exploring different package management systems for fun. There's little practical advantage imho after the migration.

As an aside, nowadays rather than Homebrew, I'd actually use nix to install small convenience tools that don’t need to be installed system-level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment