This page represents a collection of fio
performance tests, tuned for a Kubernetes etcd workload per this blog post, against various storage and platforms.
The goal is to execute the below fio
command on as many different places as possible to gauge relative performance.
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest
These tests are completely unscientific and only serve to provide a sampling for anecdotal comparisons.
This is a physical node running RHEL 7.8. The underlying storage is a consumer NVMe device with a VDO pool created on top. VDO is doing deduplication but not compression.
mytest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.7
Starting 1 process
mytest: Laying out IO file (1 file / 22MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=1498KiB/s][r=0,w=667 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=9922: Thu Oct 29 14:44:39 2020
write: IOPS=664, BW=1493KiB/s (1529kB/s)(21.0MiB/15089msec)
clat (nsec): min=4295, max=43404, avg=8670.40, stdev=2712.21
lat (nsec): min=4500, max=44118, avg=8874.89, stdev=2735.94
clat percentiles (nsec):
| 1.00th=[ 5280], 5.00th=[ 5664], 10.00th=[ 5856], 20.00th=[ 6240],
| 30.00th=[ 6816], 40.00th=[ 7776], 50.00th=[ 8512], 60.00th=[ 8896],
| 70.00th=[ 9536], 80.00th=[10176], 90.00th=[11968], 95.00th=[13504],
| 99.00th=[18304], 99.50th=[20352], 99.90th=[24704], 99.95th=[25728],
| 99.99th=[32128]
bw ( KiB/s): min= 1455, max= 1522, per=100.00%, avg=1492.43, stdev=19.61, samples=30
iops : min= 648, max= 678, avg=664.73, stdev= 8.76, samples=30
lat (usec) : 10=77.59%, 20=21.88%, 50=0.54%
fsync/fdatasync/sync_file_range:
sync (usec): min=449, max=11120, avg=1493.54, stdev=826.17
sync percentiles (usec):
| 1.00th=[ 469], 5.00th=[ 537], 10.00th=[ 603], 20.00th=[ 701],
| 30.00th=[ 906], 40.00th=[ 1287], 50.00th=[ 1418], 60.00th=[ 1582],
| 70.00th=[ 1811], 80.00th=[ 2057], 90.00th=[ 2376], 95.00th=[ 2835],
| 99.00th=[ 4228], 99.50th=[ 4752], 99.90th=[ 6521], 99.95th=[ 7504],
| 99.99th=[ 9241]
cpu : usr=0.28%, sys=2.87%, ctx=30862, majf=0, minf=33
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=1493KiB/s (1529kB/s), 1493KiB/s-1493KiB/s (1529kB/s-1529kB/s), io=21.0MiB (23.1MB), run=15089-15089msec
Disk stats (read/write):
dm-13: ios=0/26986, merge=0/0, ticks=0/15475, in_queue=15483, util=96.51%, aggrios=0/33352, aggrmerge=0/0, aggrticks=0/14546, aggrin_queue=14552, aggrutil=89.69%
dm-11: ios=0/33352, merge=0/0, ticks=0/14546, in_queue=14552, util=89.69%, aggrios=2/19669, aggrmerge=0/0, aggrticks=0/7664, aggrin_queue=7666, aggrutil=89.48%
dm-9: ios=4/5987, merge=0/0, ticks=0/821, in_queue=821, util=4.98%, aggrios=4/39339, aggrmerge=0/0, aggrticks=0/15288, aggrin_queue=15295, aggrutil=94.22%
dm-8: ios=4/39339, merge=0/0, ticks=0/15288, in_queue=15295, util=94.22%, aggrios=4149/82184, aggrmerge=188/1942, aggrticks=536/8280, aggrin_queue=4864, aggrutil=28.77%
nvme0n1: ios=4149/82184, merge=188/1942, ticks=536/8280, in_queue=4864, util=28.77%
dm-10: ios=0/33352, merge=0/0, ticks=0/14507, in_queue=14511, util=89.48%
This is a physical CentOS 7.8 host with a standard consumer SATA SSD device.
mytest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.7
Starting 1 process
mytest: Laying out IO file (1 file / 22MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=957KiB/s][r=0,w=426 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=12441: Thu Oct 29 14:49:04 2020
write: IOPS=428, BW=963KiB/s (986kB/s)(21.0MiB/23389msec)
clat (usec): min=3, max=107, avg=10.05, stdev= 4.94
lat (usec): min=3, max=107, avg=10.27, stdev= 4.99
clat percentiles (nsec):
| 1.00th=[ 4960], 5.00th=[ 5664], 10.00th=[ 6048], 20.00th=[ 6688],
| 30.00th=[ 7392], 40.00th=[ 8160], 50.00th=[ 8896], 60.00th=[ 9536],
| 70.00th=[10432], 80.00th=[11968], 90.00th=[15808], 95.00th=[19584],
| 99.00th=[32640], 99.50th=[34560], 99.90th=[40704], 99.95th=[44288],
| 99.99th=[77312]
bw ( KiB/s): min= 884, max= 1015, per=99.90%, avg=962.02, stdev=41.92, samples=46
iops : min= 394, max= 452, avg=428.57, stdev=18.63, samples=46
lat (usec) : 4=0.04%, 10=65.40%, 20=30.09%, 50=4.43%, 100=0.03%
lat (usec) : 250=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=764, max=17947, avg=2319.46, stdev=1295.89
sync percentiles (usec):
| 1.00th=[ 807], 5.00th=[ 840], 10.00th=[ 848], 20.00th=[ 881],
| 30.00th=[ 1713], 40.00th=[ 1762], 50.00th=[ 1958], 60.00th=[ 2057],
| 70.00th=[ 2212], 80.00th=[ 4080], 90.00th=[ 4178], 95.00th=[ 4293],
| 99.00th=[ 4752], 99.50th=[ 6128], 99.90th=[ 6456], 99.95th=[ 6915],
| 99.99th=[17957]
cpu : usr=0.27%, sys=1.94%, ctx=31249, majf=0, minf=32
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=963KiB/s (986kB/s), 963KiB/s-963KiB/s (986kB/s-986kB/s), io=21.0MiB (23.1MB), run=23389-23389msec
Disk stats (read/write):
dm-0: ios=30/25659, merge=0/0, ticks=34/23133, in_queue=23167, util=97.13%, aggrios=30/31702, aggrmerge=0/146, aggrticks=34/23548, aggrin_queue=23577, aggrutil=97.11%
sda: ios=30/31702, merge=0/146, ticks=34/23548, in_queue=23577, util=97.11%
A RHEL 8.2 host with a 5.4k RPM 2.5" HDD.
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest
mytest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.7
Starting 1 process
mytest: Laying out IO file (1 file / 22MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=96KiB/s][r=0,w=43 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=26682: Thu Oct 29 14:54:02 2020
write: IOPS=42, BW=96.4KiB/s (98.8kB/s)(21.0MiB/233578msec)
clat (usec): min=6, max=101, avg=21.77, stdev= 5.45
lat (usec): min=7, max=102, avg=22.35, stdev= 5.50
clat percentiles (nsec):
| 1.00th=[11584], 5.00th=[14016], 10.00th=[15936], 20.00th=[17536],
| 30.00th=[18304], 40.00th=[19584], 50.00th=[21120], 60.00th=[22912],
| 70.00th=[24448], 80.00th=[25984], 90.00th=[28544], 95.00th=[30592],
| 99.00th=[36608], 99.50th=[39168], 99.90th=[45312], 99.95th=[61696],
| 99.99th=[99840]
bw ( KiB/s): min= 58, max= 130, per=99.97%, avg=95.97, stdev= 6.75, samples=467
iops : min= 26, max= 58, avg=42.91, stdev= 3.03, samples=467
lat (usec) : 10=0.13%, 20=43.28%, 50=56.51%, 100=0.07%, 250=0.01%
fsync/fdatasync/sync_file_range:
sync (msec): min=5, max=234, avg=23.26, stdev= 7.68
sync percentiles (msec):
| 1.00th=[ 11], 5.00th=[ 12], 10.00th=[ 14], 20.00th=[ 16],
| 30.00th=[ 19], 40.00th=[ 21], 50.00th=[ 24], 60.00th=[ 26],
| 70.00th=[ 29], 80.00th=[ 31], 90.00th=[ 34], 95.00th=[ 34],
| 99.00th=[ 34], 99.50th=[ 35], 99.90th=[ 58], 99.95th=[ 64],
| 99.99th=[ 120]
cpu : usr=0.04%, sys=0.25%, ctx=29779, majf=0, minf=12
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=96.4KiB/s (98.8kB/s), 96.4KiB/s-96.4KiB/s (98.8kB/s-98.8kB/s), io=21.0MiB (23.1MB), run=233578-233578msec
Disk stats (read/write):
dm-0: ios=0/31395, merge=0/0, ticks=0/232595, in_queue=232595, util=8.70%, aggrios=0/25792, aggrmerge=0/5628, aggrticks=0/231779, aggrin_queue=219969, aggrutil=8.70%
sda: ios=0/25792, merge=0/5628, ticks=0/231779, in_queue=219969, util=8.70%
This test comes from a RHEL 8.1 virtual machine hosted in a vSphere 7 environment using VSAN storage. I do not know the details of the VSAN configuration.
[root@rhel8 ~]# fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest
mytest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.7
Starting 1 process
mytest: Laying out IO file (1 file / 22MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=375KiB/s][r=0,w=167 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=26788: Thu Oct 29 15:51:00 2020
write: IOPS=182, BW=409KiB/s (419kB/s)(21.0MiB/55073msec)
clat (usec): min=6, max=12958, avg=28.87, stdev=234.17
lat (usec): min=6, max=12959, avg=29.25, stdev=234.18
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 10], 10.00th=[ 10], 20.00th=[ 12],
| 30.00th=[ 13], 40.00th=[ 14], 50.00th=[ 15], 60.00th=[ 16],
| 70.00th=[ 18], 80.00th=[ 21], 90.00th=[ 60], 95.00th=[ 75],
| 99.00th=[ 108], 99.50th=[ 123], 99.90th=[ 783], 99.95th=[ 5866],
| 99.99th=[10421]
bw ( KiB/s): min= 53, max= 759, per=99.80%, avg=408.16, stdev=110.25, samples=110
iops : min= 24, max= 338, avg=181.96, stdev=49.09, samples=110
lat (usec) : 10=11.96%, 20=65.98%, 50=9.79%, 100=10.89%, 250=1.26%
lat (usec) : 500=0.02%, 1000=0.02%
lat (msec) : 4=0.01%, 10=0.06%, 20=0.02%
fsync/fdatasync/sync_file_range:
sync (usec): min=1198, max=233077, avg=5458.51, stdev=4966.79
sync percentiles (usec):
| 1.00th=[ 1434], 5.00th=[ 1614], 10.00th=[ 2040], 20.00th=[ 2835],
| 30.00th=[ 3228], 40.00th=[ 3621], 50.00th=[ 4424], 60.00th=[ 6128],
| 70.00th=[ 6652], 80.00th=[ 7635], 90.00th=[ 8717], 95.00th=[ 10683],
| 99.00th=[ 17957], 99.50th=[ 22676], 99.90th=[ 47449], 99.95th=[ 70779],
| 99.99th=[183501]
cpu : usr=0.19%, sys=1.53%, ctx=27135, majf=0, minf=13
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=409KiB/s (419kB/s), 409KiB/s-409KiB/s (419kB/s-419kB/s), io=21.0MiB (23.1MB), run=55073-55073msec
Disk stats (read/write):
dm-0: ios=3/21826, merge=0/0, ticks=3/105013, in_queue=105016, util=39.94%, aggrios=3/21560, aggrmerge=0/364, aggrticks=3/84669, aggrin_queue=76510, aggrutil=40.04%
sda: ios=3/21560, merge=0/364, ticks=3/84669, in_queue=76510, util=40.04%
PortWorx has published an fio
container image and sample Kubernetes deployment to test performance. Using this as the basis, I modified the Deployment
and jobfile for this test. Anyone can replicate the test using the following:
cat << EOF | oc create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: fio-test
spec:
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
replicas: 1
selector:
matchLabels:
app: fio-test
template:
metadata:
labels:
app: fio-test
spec:
containers:
- name: fio-container
image: wallnerryan/fiotools-aio
ports:
- containerPort: 8000
env:
- name: REMOTEFILES
value: "https://gist.githubusercontent.com/acsulli/2c7c71594c16273a2cf087963c339568/raw/fd7d07f3dac3e4923a3c08c6d60b03b0e0b63c65/etcd.fio"
- name: JOBFILES
value: etcd.fio
- name: PLOTNAME
value: etcdtest
---
apiVersion: v1
kind: Service
metadata:
name: fiotools-etcd
labels:
name: fiotools-etcd
spec:
type: NodePort
ports:
- port: 8000
targetPort: 8000
name: http
selector:
app: fio-test
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: fiotools-etcd
spec:
port:
targetPort: http
to:
kind: Service
name: fiotools-etcd
weight: 100
EOF
Once the pod was deployed and the fio
test finished, browsing to the route provides a directory listing with several files, including a file with the fio
output. The contents of that file are shown below.
etcdtest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.8
Starting 1 process
etcdtest: Laying out IO file (1 file / 22MiB)
etcdtest: (groupid=0, jobs=1): err= 0: pid=12: Thu Oct 29 21:27:27 2020
write: IOPS=359, BW=808KiB/s (828kB/s)(21.0MiB/27873msec)
clat (usec): min=10, max=99670, avg=31.94, stdev=997.29
lat (usec): min=11, max=99672, avg=33.22, stdev=997.31
clat percentiles (usec):
| 1.00th=[ 14], 5.00th=[ 15], 10.00th=[ 16], 20.00th=[ 16],
| 30.00th=[ 17], 40.00th=[ 19], 50.00th=[ 19], 60.00th=[ 20],
| 70.00th=[ 21], 80.00th=[ 22], 90.00th=[ 31], 95.00th=[ 38],
| 99.00th=[ 62], 99.50th=[ 81], 99.90th=[ 265], 99.95th=[ 478],
| 99.99th=[ 6325]
bw ( KiB/s): min= 23, max=227722, per=100.00%, avg=120977.34, stdev=28613.46, samples=10029
iops : min= 1, max= 1, avg= 1.00, stdev= 0.00, samples=10029
lat (usec) : 20=66.86%, 50=31.33%, 100=1.49%, 250=0.21%, 500=0.07%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 10=0.01%, 100=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=1023, max=1952.7k, avg=2731.70, stdev=28999.19
sync percentiles (usec):
| 1.00th=[ 1139], 5.00th=[ 1188], 10.00th=[ 1221],
| 20.00th=[ 1270], 30.00th=[ 1336], 40.00th=[ 1483],
| 50.00th=[ 2474], 60.00th=[ 2573], 70.00th=[ 2638],
| 80.00th=[ 2704], 90.00th=[ 2868], 95.00th=[ 3163],
| 99.00th=[ 5997], 99.50th=[ 8094], 99.90th=[ 17695],
| 99.95th=[ 89654], 99.99th=[1887437]
cpu : usr=0.58%, sys=3.07%, ctx=29919, majf=0, minf=265
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=808KiB/s (828kB/s), 808KiB/s-808KiB/s (828kB/s-828kB/s), io=21.0MiB (23.1MB), run=27873-27873msec
etcdtest: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.8
Starting 1 process
etcdtest: Laying out IO file (1 file / 22MiB)
etcdtest: (groupid=0, jobs=1): err= 0: pid=14: Thu Oct 29 21:33:14 2020
write: IOPS=1036, BW=2328KiB/s (2384kB/s)(21.0MiB/9676msec)
clat (nsec): min=3008, max=93601, avg=8060.72, stdev=3770.63
lat (nsec): min=3097, max=93910, avg=8257.60, stdev=3851.54
clat percentiles (nsec):
| 1.00th=[ 4384], 5.00th=[ 4640], 10.00th=[ 4832], 20.00th=[ 5536],
| 30.00th=[ 6112], 40.00th=[ 6432], 50.00th=[ 6880], 60.00th=[ 7456],
| 70.00th=[ 8512], 80.00th=[ 9920], 90.00th=[12480], 95.00th=[15040],
| 99.00th=[22144], 99.50th=[25216], 99.90th=[34048], 99.95th=[36608],
| 99.99th=[44800]
bw ( KiB/s): min=24572, max=764627, per=100.00%, avg=326886.33, stdev=103829.66, samples=10029
iops : min= 1, max= 1, avg= 1.00, stdev= 0.00, samples=10029
lat (usec) : 4=0.23%, 10=80.10%, 20=18.11%, 50=1.56%, 100=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=356, max=44621, avg=954.29, stdev=1533.39
sync percentiles (usec):
| 1.00th=[ 388], 5.00th=[ 404], 10.00th=[ 416], 20.00th=[ 433],
| 30.00th=[ 461], 40.00th=[ 553], 50.00th=[ 922], 60.00th=[ 971],
| 70.00th=[ 996], 80.00th=[ 1045], 90.00th=[ 1139], 95.00th=[ 1336],
| 99.00th=[ 6325], 99.50th=[ 9765], 99.90th=[21890], 99.95th=[29230],
| 99.99th=[40109]
cpu : usr=0.38%, sys=2.18%, ctx=26819, majf=0, minf=266
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=2328KiB/s (2384kB/s), 2328KiB/s-2328KiB/s (2384kB/s-2384kB/s), io=21.0MiB (23.1MB), run=9676-9676msec
Thank you, I fixed that error and removed the empty
labels
.We aren't writing to
/tmp
on the host. It'll write to the overlay file system, which is under/var/lib/containers
. Yes, it could affect other Pods that are running or be affected by other running Pods since they are all sharing the same device.Without the control plane nodes being schedulable, the Pod will not be on a control plane node and as a result the result could be different on a worker. That's ok though since, at least with the default IPI deployment, they're using the same underlying storage type so it should be very similar. FWIW, etcd data is mounted from
/var/lib/etcd
on the control plane nodes, so if they were schedulable it would be from the same disk, even if you were to mount a different device for/var
.This isn't a perfect test, but that's ok...it's meant to be an estimate for comparison purposes, not a precise performance comparison.