System information
| Type |
Version/Name |
| Distribution Name |
Amazon Linux |
| Distribution Version |
2 |
| Kernel Version |
5.10.236-228.935.amzn2.aarch64 |
| Architecture |
aarch64 |
| OpenZFS Version |
2.1.7 |
Describe the problem you're observing
We are using ZFS with Lustre v2.15.5. Occasionally when memory pressure is high, we are seeing deadlocks in the ZFS layer. We see threads inside arc_read which tries to allocate more memory, triggering memory reclamation. We also see kswapd stuck in arc_buf_destroy on the hash lock (buf_hash_table). We're guessing that one of the arc_read threads is holding the hash lock while it tries to reclaim memory and kswapd getting stuck on the hash lock is stalling the whole system.
Describe how to reproduce the problem
High memory usage while driving IO.
Include any warning/errors/backtraces from the system logs
Lustre IO thread
[372109.472745] Lustre: ll_ost_io00_006: service thread pid 4975 was inactive for 202.714 seconds. The thread might be hung
[372109.472747] task:ll_ost00_009 state:R
[372109.472748] task:ll_ost00_011 state:R
[372109.472751] running task
[372109.472751] running task stack: 0 pid: 4934 ppid: 2 flags:0x00000228
[372109.472754] Call trace:
[372109.472759] __switch_to+0xbc/0xfc
[372109.472763] __schedule+0x28c/0x718
[372109.472765] _cond_resched+0x48/0x60
[372109.472768] shrink_page_list+0x6c/0xc70
[372109.472769] shrink_inactive_list+0x160/0x510
[372109.472771] shrink_lruvec+0x26c/0x300
[372109.472772] shrink_node_memcgs+0x1c0/0x230
[372109.472773] shrink_node+0x150/0x5e0
[372109.472775] shrink_zones+0x98/0x220
[372109.472776] do_try_to_free_pages+0xac/0x2e0
[372109.472778] try_to_free_pages+0x120/0x25c
[372109.472780] __alloc_pages_slowpath.constprop.0+0x400/0x82c
[372109.472781] __alloc_pages_nodemask+0x2b4/0x310
[372109.472831] abd_alloc_chunks+0x184/0x470 [zfs]
[372109.472881] abd_alloc+0x90/0x120 [zfs]
[372109.472924] arc_hdr_alloc_abd+0x134/0x22c [zfs]
[372109.472966] arc_read+0x4a8/0x10a0 [zfs]
[372109.473008] dbuf_read_impl.constprop.0+0x23c/0x3dc [zfs]
[372109.473050] dbuf_read+0xd4/0x65c [zfs]
[372109.473092] dmu_buf_hold_by_dnode+0xa0/0x124 [zfs]
[372109.473136] zap_get_leaf_byblk+0x68/0x154 [zfs]
[372109.473178] zap_deref_leaf+0xb4/0x148 [zfs]
[372109.473221] fzap_lookup+0x80/0x1b8 [zfs]
[372109.473263] zap_lookup_impl+0x6c/0x1c8 [zfs]
[372109.473305] zap_lookup+0xc8/0x10c [zfs]
[372109.473314] osd_fid_lookup+0x27c/0x4d4 [osd_zfs]
[372109.473322] osd_object_init+0x314/0xaec [osd_zfs]
[372109.473355] lu_object_start+0x84/0x154 [obdclass]
[372109.473385] lu_object_find_at+0x37c/0x72c [obdclass]
[372109.473415] lu_object_find+0x1c/0x24 [obdclass]
[372109.473423] ofd_object_find+0x6c/0x18c [ofd]
[372109.473430] ofd_lvbo_init+0x294/0x938 [ofd]
[372109.473481] ldlm_lvbo_init+0x70/0x2e0 [ptlrpc]
[372109.473529] ldlm_handle_enqueue0+0x540/0x1b24 [ptlrpc]
[372109.473577] tgt_enqueue+0x84/0x2c0 [ptlrpc]
[372109.473624] tgt_handle_request0+0x2b4/0x658 [ptlrpc]
[372109.473671] tgt_request_handle+0x268/0xaac [ptlrpc]
[372109.473718] ptlrpc_server_handle_request.isra.0+0x460/0xf20 [ptlrpc]
[372109.473765] ptlrpc_main+0xd24/0x15bc [ptlrpc]
[372109.473768] kthread+0x118/0x120
kswapd
372073.633195] INFO: task kswapd0:188 blocked for more than 122 seconds.
[372073.634293] Tainted: P OE 5.10.236-228.935.amzn2.aarch64 #1
[372073.635482] echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message.
[372073.636758] task:kswapd0 state:D stack: 0 pid: 188 ppid: 2 flags:0x00000028
[372073.638118] Call trace:
[372073.638546] __switch_to+0xbc/0xfc
[372073.639123] __schedule+0x28c/0x718
[372073.639712] schedule+0x4c/0xcc
[372073.640248] schedule_preempt_disabled+0x14/0x1c
[372073.641015] __mutex_lock.constprop.0+0x190/0x640
[372073.641796] __mutex_lock_slowpath+0x18/0x20
[372073.642506] mutex_lock+0x74/0x80
[372073.643112] arc_buf_destroy+0x84/0x178 [zfs]
[372073.643884] dbuf_destroy+0x38/0x3dc [zfs]
[372073.644607] dbuf_evict_one+0x168/0x188 [zfs]
[372073.645376] dbuf_evict_notify+0xe0/0xf0 [zfs]
[372073.646153] dbuf_rele_and_unlock+0x61c/0x708 [zfs]
[372073.646998] dmu_buf_rele+0x44/0x58 [zfs]
[372073.647710] sa_handle_destroy+0x7c/0x120 [zfs]
[372073.648470] osd_object_delete+0x64/0x17c [osd_zfs]
[372073.649312] lu_object_free.isra.0+0x88/0x1fc [obdclass]
[372073.650215] lu_site_purge_objects+0x338/0x47c [obdclass]
[372073.651127] lu_cache_shrink_scan+0xa0/0x18c [obdclass]
[372073.651989] do_shrink_slab+0x194/0x394
[372073.652635] shrink_slab+0xbc/0x13c
[372073.653234] shrink_node_memcgs+0x1d4/0x230
[372073.653943] shrink_node+0x150/0x5e0
[372073.654554] balance_pgdat+0x260/0x524
[372073.655192] kswapd+0x124/0x208
[372073.655732] kthread+0x118/0x120
System information
Describe the problem you're observing
We are using ZFS with Lustre v2.15.5. Occasionally when memory pressure is high, we are seeing deadlocks in the ZFS layer. We see threads inside
arc_readwhich tries to allocate more memory, triggering memory reclamation. We also seekswapdstuck inarc_buf_destroyon the hash lock (buf_hash_table). We're guessing that one of the arc_read threads is holding the hash lock while it tries to reclaim memory and kswapd getting stuck on the hash lock is stalling the whole system.Describe how to reproduce the problem
High memory usage while driving IO.
Include any warning/errors/backtraces from the system logs
Lustre IO thread
kswapd