[Wed Mar 15 13:43:09 2023] AuxIoPool1: page allocation failure: order:0, mode:0x1000000(GFP_NOWAIT), nodemask=(null)
[Wed Mar 15 13:43:09 2023] AuxIoPool1 cpuset=/ mems_allowed=0
[Wed Mar 15 13:43:09 2023] CPU: 6 PID: 2760 Comm: AuxIoPool1 Not tainted 4.14.305-227.531.amzn2.aarch64 #1
[Wed Mar 15 13:43:09 2023] Hardware name: Amazon EC2 c6gd.2xlarge/, BIOS 1.0 11/1/2018
[Wed Mar 15 13:43:09 2023] Call trace:
[Wed Mar 15 13:43:09 2023] dump_backtrace+0x0/0x190
[Wed Mar 15 13:43:09 2023] show_stack+0x24/0x30
[Wed Mar 15 13:43:09 2023] dump_stack+0x98/0xc8
[Wed Mar 15 13:43:09 2023] warn_alloc+0x11c/0x1b4
[Wed Mar 15 13:43:09 2023] __alloc_pages_nodemask+0xcc8/0xd48
[Wed Mar 15 13:43:09 2023] alloc_page_interleave+0x38/0xa4
[Wed Mar 15 13:43:09 2023] alloc_pages_current+0xd8/0xe8
[Wed Mar 15 13:43:09 2023] allocate_slab+0x3bc/0x4f4
[Wed Mar 15 13:43:09 2023] new_slab+0x64/0xc0
[Wed Mar 15 13:43:09 2023] ___slab_alloc+0x410/0x5a0
[Wed Mar 15 13:43:09 2023] __slab_alloc+0x50/0x68
[Wed Mar 15 13:43:09 2023] kmem_cache_alloc+0x1c0/0x1f8
[Wed Mar 15 13:43:09 2023] btracker_queue+0x38/0x174 [dm_cache]
[Wed Mar 15 13:43:09 2023] __lookup+0x32c/0x5ac [dm_cache_smq]
[Wed Mar 15 13:43:09 2023] smq_lookup+0x80/0x118 [dm_cache_smq]
[Wed Mar 15 13:43:09 2023] map_bio.part.38+0x84/0x5c0 [dm_cache]
[Wed Mar 15 13:43:09 2023] map_bio+0x7c/0x94 [dm_cache]
[Wed Mar 15 13:43:09 2023] cache_map+0xe8/0x160 [dm_cache]
[Wed Mar 15 13:43:09 2023] __map_bio+0xc4/0x22c [dm_mod]
[Wed Mar 15 13:43:09 2023] __split_and_process_non_flush+0x140/0x340 [dm_mod]
[Wed Mar 15 13:43:09 2023] __split_and_process_bio+0x128/0x26c [dm_mod]
[Wed Mar 15 13:43:09 2023] dm_make_request+0x7c/0xdc [dm_mod]
[Wed Mar 15 13:43:09 2023] generic_make_request+0xfc/0x290
[Wed Mar 15 13:43:09 2023] submit_bio+0x5c/0x190
[Wed Mar 15 13:43:09 2023] mpage_readpages+0x168/0x200
[Wed Mar 15 13:43:09 2023] xfs_vm_readpages+0x40/0xc4
[Wed Mar 15 13:43:09 2023] __do_page_cache_readahead+0x1c4/0x2a4
[Wed Mar 15 13:43:09 2023] ondemand_readahead+0x148/0x2ac
[Wed Mar 15 13:43:09 2023] page_cache_sync_readahead+0x60/0xa0
[Wed Mar 15 13:43:09 2023] generic_file_buffered_read+0x34c/0x6d0
[Wed Mar 15 13:43:09 2023] generic_file_read_iter+0x114/0x180
[Wed Mar 15 13:43:09 2023] xfs_file_buffered_aio_read+0x54/0x110
[Wed Mar 15 13:43:09 2023] xfs_file_read_iter+0xb0/0x140
[Wed Mar 15 13:43:09 2023] __vfs_read+0x100/0x168
[Wed Mar 15 13:43:09 2023] vfs_read+0x94/0x158
[Wed Mar 15 13:43:09 2023] SyS_pread64+0xb4/0xd0
[Wed Mar 15 13:43:09 2023] __sys_trace_return+0x0/0x4
[Wed Mar 15 13:43:09 2023] Mem-Info:
[Wed Mar 15 13:43:09 2023] active_anon:2112521 inactive_anon:8287 isolated_anon:0
active_file:635559 inactive_file:974447 isolated_file:0
unevictable:0 dirty:958 writeback:0 unstable:0
slab_reclaimable:81091 slab_unreclaimable:33511
mapped:32915 shmem:8324 pagetables:7644 bounce:0
free:26491 free_pcp:163 free_cma:7028
[Wed Mar 15 13:43:09 2023] Node 0 active_anon:8450084kB inactive_anon:33148kB active_file:2542236kB inactive_file:3897788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:131660kB dirty:3832kB writeback:0kB shmem:33296kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Mar 15 13:43:09 2023] Node 0 DMA free:62316kB min:2804kB low:3812kB high:4820kB active_anon:425620kB inactive_anon:0kB active_file:156416kB inactive_file:263624kB unevictable:0kB writepending:500kB present:1048576kB managed:1018680kB mlocked:0kB kernel_stack:0kB pagetables:696kB bounce:0kB free_pcp:116kB local_pcp:116kB free_cma:28112kB
[Wed Mar 15 13:43:09 2023] lowmem_reserve[]: 0 14815 14815
[Wed Mar 15 13:43:09 2023] Node 0 Normal free:43524kB min:42248kB low:57416kB high:72584kB active_anon:8024464kB inactive_anon:33148kB active_file:2385820kB inactive_file:3634164kB unevictable:0kB writepending:3332kB present:15433728kB managed:15170636kB mlocked:0kB kernel_stack:12032kB pagetables:29880kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
[Wed Mar 15 13:43:09 2023] lowmem_reserve[]: 0 0 0
[Wed Mar 15 13:43:09 2023] Node 0 DMA: 3672*4kB (UMEHC) 1638*8kB (UMEHC) 756*16kB (UMEHC) 239*32kB (UMHC) 65*64kB (UMH) 30*128kB (UMH) 12*256kB (UH) 7*512kB (UH) 0*1024kB 0*2048kB 0*4096kB = 62192kB
[Wed Mar 15 13:43:09 2023] Node 0 Normal: 10461*4kB (UMH) 21*8kB (UMH) 3*16kB (UH) 8*32kB (UH) 2*64kB (H) 3*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43340kB
[Wed Mar 15 13:43:09 2023] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Mar 15 13:43:09 2023] 1615038 total pagecache pages
[Wed Mar 15 13:43:09 2023] 0 pages in swap cache
[Wed Mar 15 13:43:09 2023] Swap cache stats: add 0, delete 0, find 0/0
[Wed Mar 15 13:43:09 2023] Free swap = 0kB
[Wed Mar 15 13:43:09 2023] Total swap = 0kB
[Wed Mar 15 13:43:09 2023] 4120576 pages RAM
[Wed Mar 15 13:43:09 2023] 0 pages HighMem/MovableOnly
[Wed Mar 15 13:43:09 2023] 73247 pages reserved
[Wed Mar 15 13:43:09 2023] 16384 pages cma reserved
[Wed Mar 15 13:43:09 2023] SLUB: Unable to allocate memory on node -1, gfp=0x1000000(GFP_NOWAIT)
[Wed Mar 15 13:43:09 2023] cache: iommu_iova, object size: 64, buffer size: 64, default order: 0, min order: 0
[Wed Mar 15 13:43:09 2023] node 0: slabs: 59, objs: 3776, free: 0
I'm not a kernel expert, but it appears that the kernel failed to allocate memory (kmem_cache_alloc) during an attempt to read from an XFS filesystem, which involved calls in the dm_cache module - i.e. the kernel module which implements lvmcache.
The function which made the memory allocation was btracker_queue+0x38/0x174 [dm_cache]
. From uname
we know the kernel is version 4.14.305-227.531.amzn2.aarch64. I don't have an immediate index of this specific Amazon version, but the vanilla kernel of that function / version can be seen at https://elixir.bootlin.com/linux/v4.14.305/source/drivers/md/dm-cache-background-tracker.c#L168 - and where it calls kmem_cache_alloc()
:
int btracker_queue(struct background_tracker *b,
struct policy_work *work,
struct policy_work **pwork)
{
struct bt_work *w;
if (pwork)
*pwork = NULL;
if (max_work_reached(b))
return -ENOMEM;
w = kmem_cache_alloc(b->work_cache, GFP_NOWAIT);
if (!w)
return -ENOMEM;
...
i.e. if kmem_cache_alloc
fails then btracker_queue
returns -ENOMEM
. In the call-stack above, btracker_queue()
is called by __lookup()
via queue_promotion()
:
static void queue_promotion(struct smq_policy *mq, dm_oblock_t oblock,
struct policy_work **workp)
{
...
e = alloc_entry(&mq->cache_alloc);
BUG_ON(!e);
e->pending_work = true;
work.op = POLICY_PROMOTE;
work.oblock = oblock;
work.cblock = infer_cblock(mq, e);
btracker_queue(mq->bg_work, &work, workp);
}
Note that the last line which calls btracker_queue doesn't check the return value (ENOMEM in this case). It's not clear if this is a bug, or if failure to allocate memory for a cache entry is safe to ignore. I looked at the latest version of this code on GitHub, and did spot a patch was added to improve the error checking here for kernel 4.15 (https://github.com/torvalds/linux/commit/1e72a8e809f030bd4e318a49c497ee38e47e82c1).
Between kernel 4.14 (older Amazon Linux 2) and 5.10 (newer Amazon Linux 2) the following changes exist in the
dm-cache-*
files: