We have an autoscaling group of m5.large
instances with 250GB EBS root volumes that we use for running our Buildkite test suites. This group scales up from 0 to 40 during the work day and then down again, so each day we see new instances.
We've been seeing every few days that a few of the instances run very, very slowly. Our test suite either takes 100x it's usual time or times out entirely. On the host machine, a basic command like aws s3api head-object --bucket blah --key blah
will take 45 seconds.
All instances are m5.large
in us-east-1
running the latest Amazon Linux 2 ami-06631de3819cb42f3
My theory is there is something wrong with the disk subsystem or the instances themselves. For troubleshooting we stopped dockerd
, buildkite-agent
and awslogsd
. Load on the system drops to close to zero.
yum install -y fio
time sudo dd if=/dev/nvme0n1 of=/dev/null bs=1M
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Device 1d0f:1111
00:04.0 Non-Volatile memory controller: Device 1d0f:8061
00:05.0 Ethernet controller: Device 1d0f:ec20
$ time sudo dd if=/dev/nvme0n1 of=/dev/null bs=1M
256000+0 records in
256000+0 records out
268435456000 bytes (268 GB) copied, 1595.62 s, 168 MB/s
real 26m35.636s
user 0m0.252s
sys 1m9.410s
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [8604KB/2820KB/0KB /s] [2151/705/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=27615: Mon Aug 13 06:17:32 2018
read : io=3070.4MB, bw=8926.7KB/s, iops=2231, runt=352203msec
write: io=1025.8MB, bw=2982.2KB/s, iops=745, runt=352203msec
cpu : usr=0.29%, sys=0.81%, ctx=108775, majf=0, minf=10
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=3070.4MB, aggrb=8926KB/s, minb=8926KB/s, maxb=8926KB/s, mint=352203msec, maxt=352203msec
WRITE: io=1025.8MB, aggrb=2982KB/s, minb=2982KB/s, maxb=2982KB/s, mint=352203msec, maxt=352203msec
Disk stats (read/write):
nvme0n1: ios=793056/269858, merge=14/12007, ticks=10692416/3627860, in_queue=13969756, util=99.95%
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Device 1d0f:1111
00:04.0 Non-Volatile memory controller: Device 1d0f:8061
00:05.0 Ethernet controller: Device 1d0f:ec20
$ time sudo dd if=/dev/nvme0n1 of=/dev/null bs=1M
256000+0 records in
256000+0 records out
268435456000 bytes (268 GB) copied, 6913.63 s, 38.8 MB/s
real 115m14.237s
user 0m13.064s
sys 110m24.088s
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.14
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [5078KB/1768KB/0KB /s] [1269/442/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=6156: Mon Aug 13 07:57:55 2018
read : io=3070.4MB, bw=5255.1KB/s, iops=1313, runt=598274msec
write: io=1025.8MB, bw=1755.6KB/s, iops=438, runt=598274msec
cpu : usr=16.70%, sys=68.43%, ctx=234869, majf=0, minf=9
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=3070.4MB, aggrb=5255KB/s, minb=5255KB/s, maxb=5255KB/s, mint=598274msec, maxt=598274msec
WRITE: io=1025.8MB, aggrb=1755KB/s, minb=1755KB/s, maxb=1755KB/s, mint=598274msec, maxt=598274msec
Disk stats (read/write):
nvme0n1: ios=786036/262893, merge=0/10, ticks=442812/170912, in_queue=202388, util=27.99%
$ time sudo dd if=/dev/nvme0n1 of=/dev/null bs=1M
$ sudo dd if=/dev/nvme0n1 of=/dev/null bs=1M
256000+0 records in
256000+0 records out
268435456000 bytes (268 GB) copied, 6821.4 s, 39.4 MB/s
real 113m41.990s
user 0m13.491s
sys 109m1.143s