System information
| Type |
Version/Name |
| Distribution Name |
Debian |
| Distribution Version |
bullseye (11) |
| Kernel Version |
5.10.0-14 |
| Architecture |
amd64 |
| OpenZFS Version |
zfs-2.0.3-9 zfs-kmod-2.0.3-9 |
Tried on another system
| Type |
Version/Name |
| Distribution Name |
hrmpf rescue system / Void Linux |
| Distribution Version |
20211227 |
| Kernel Version |
5.15.11_1 |
| Architecture |
amd64 |
| OpenZFS Version |
zfs-2.1.2-1 zfs-kmod-2.1.2-1 |
Describe the problem you're observing
I have a (no redundancy) pool which have been corrupted by a HW failure. Trying to import it causes a PANIC and the zpool import process hangs in "D uninterruptible sleep (usually IO)" state:
PANIC: zfs: adding existent segment to range tree
Maybe issue #13445 is related, there is a similar backtrace there.
The pool can be imported with zpool import -o readonly=true -f rpool or with zpool import -f -T 2676127 rpool.
Describe how to reproduce the problem
I have a dump / image of the pool in the corrupted state on which I can repeatedly reproduce this with both system / ZFS version.
I can (and willing to) try out possible solutions, too. (The original pool have been recovered with the -T txg method.)
Unfortunately I can not share the whole image as it contains personal information, some short hexdump may be possible.
Include any warning/errors/backtraces from the system logs
The results have been reproduced running on qemu/kvm (version 5.2.0) using the image as a virtual disks, the original hypervisor was an ESXi.
The original system (Debian, zfs-2.0.3-9)
[ 65.022435] PANIC: zfs: adding existent segment to range tree (offset=76bab1000 size=12000)
[ 65.024094] Showing stack for process 208
[ 65.024915] CPU: 0 PID: 208 Comm: z_wr_iss Tainted: P OE 5.10.0-14-amd64 #1 Debian 5.10.113-1
[ 65.026795] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 65.028896] Call Trace:
[ 65.028896] dump_stack+0x6b/0x83
[ 65.028896] vcmn_err.cold+0x58/0x80 [spl]
[ 65.028896] ? metaslab_rangesize64_compare+0x40/0x40 [zfs]
[ 65.028896] ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs]
[ 65.028896] ? zfs_btree_add_idx+0xd1/0x210 [zfs]
[ 65.028896] ? zfs_btree_find+0x175/0x300 [zfs]
[ 65.028896] zfs_panic_recover+0x6d/0x90 [zfs]
[ 65.028896] range_tree_add_impl+0x305/0xe40 [zfs]
[ 65.028896] ? range_tree_remove_impl+0xf10/0xf10 [zfs]
[ 65.028896] range_tree_walk+0xad/0x1e0 [zfs]
[ 65.028896] metaslab_load+0x359/0x8b0 [zfs]
[ 65.028896] metaslab_activate+0x4c/0x220 [zfs]
[ 65.028896] ? metaslab_set_selected_txg+0x7f/0xc0 [zfs]
[ 65.028896] metaslab_alloc_dva+0x134/0x1210 [zfs]
[ 65.028896] metaslab_alloc+0xbe/0x250 [zfs]
[ 65.028896] zio_dva_allocate+0xd4/0x800 [zfs]
[ 65.028896] ? _cond_resched+0x16/0x40
[ 65.028896] ? mutex_lock+0xe/0x30
[ 65.028896] ? metaslab_class_throttle_reserve+0xc3/0xe0 [zfs]
[ 65.028896] ? zio_io_to_allocate+0x60/0x80 [zfs]
[ 65.028896] zio_execute+0x81/0x120 [zfs]
[ 65.028896] taskq_thread+0x2da/0x520 [spl]
[ 65.028896] ? wake_up_q+0xa0/0xa0
[ 65.028896] ? zio_destroy+0xf0/0xf0 [zfs]
[ 65.028896] ? taskq_thread_spawn+0x50/0x50 [spl]
[ 65.028896] kthread+0x11b/0x140
[ 65.028896] ? __kthread_bind_mask+0x60/0x60
[ 65.028896] ret_from_fork+0x22/0x30
[ 242.724547] INFO: task zpool:148 blocked for more than 120 seconds.
[ 242.727882] Tainted: P OE 5.10.0-14-amd64 #1 Debian 5.10.113-1
[ 242.732106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.736450] task:zpool state:D stack: 0 pid: 148 ppid: 131 flags:0x00004002
[ 242.741604] Call Trace:
[ 242.742809] __schedule+0x282/0x870
[ 242.744808] schedule+0x46/0xb0
[ 242.745984] io_schedule+0x42/0x70
[ 242.747203] cv_wait_common+0xac/0x130 [spl]
[ 242.748743] ? add_wait_queue_exclusive+0x70/0x70
[ 242.750497] txg_wait_synced_impl+0xc9/0x110 [zfs]
[ 242.752330] txg_wait_synced+0xc/0x40 [zfs]
[ 242.754256] spa_config_update+0x3f/0x170 [zfs]
[ 242.755667] spa_import+0x5e0/0x840 [zfs]
[ 242.757574] zfs_ioc_pool_import+0x12f/0x150 [zfs]
[ 242.759000] zfsdev_ioctl_common+0x697/0x870 [zfs]
[ 242.760111] ? _copy_from_user+0x28/0x60
[ 242.761065] zfsdev_ioctl+0x53/0xe0 [zfs]
[ 242.761988] __x64_sys_ioctl+0x83/0xb0
[ 242.762849] do_syscall_64+0x33/0x80
[ 242.763672] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 242.764840] RIP: 0033:0x7f6863364cc7
[ 242.765652] RSP: 002b:00007ffe22a5aa28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 242.767341] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6863364cc7
[ 242.769220] RDX: 00007ffe22a5aaa0 RSI: 0000000000005a02 RDI: 0000000000000003
[ 242.771302] RBP: 00007ffe22a5e990 R08: 0000000000000000 R09: 00007f686342ebe0
[ 242.775328] R10: 0000000010000000 R11: 0000000000000246 R12: 0000555e380bf320
[ 242.778226] R13: 00007ffe22a5aaa0 R14: 00007f685c001970 R15: 0000000000000000
[ 242.782019] INFO: task z_wr_iss:208 blocked for more than 120 seconds.
[ 242.785910] Tainted: P OE 5.10.0-14-amd64 #1 Debian 5.10.113-1
[ 242.790663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.794223] task:z_wr_iss state:D stack: 0 pid: 208 ppid: 2 flags:0x00004000
[ 242.798323] Call Trace:
[ 242.799648] __schedule+0x282/0x870
[ 242.801901] schedule+0x46/0xb0
[ 242.803602] vcmn_err.cold+0x7e/0x80 [spl]
[ 242.805956] ? metaslab_rangesize64_compare+0x40/0x40 [zfs]
[ 242.809675] ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs]
[ 242.812331] ? zfs_btree_add_idx+0xd1/0x210 [zfs]
[ 242.814956] ? zfs_btree_find+0x175/0x300 [zfs]
[ 242.818007] zfs_panic_recover+0x6d/0x90 [zfs]
[ 242.820627] range_tree_add_impl+0x305/0xe40 [zfs]
[ 242.824200] ? range_tree_remove_impl+0xf10/0xf10 [zfs]
[ 242.826735] range_tree_walk+0xad/0x1e0 [zfs]
[ 242.828848] metaslab_load+0x359/0x8b0 [zfs]
[ 242.830868] metaslab_activate+0x4c/0x220 [zfs]
[ 242.832916] ? metaslab_set_selected_txg+0x7f/0xc0 [zfs]
[ 242.835249] metaslab_alloc_dva+0x134/0x1210 [zfs]
[ 242.837181] metaslab_alloc+0xbe/0x250 [zfs]
[ 242.838851] zio_dva_allocate+0xd4/0x800 [zfs]
[ 242.841201] ? _cond_resched+0x16/0x40
[ 242.842367] ? mutex_lock+0xe/0x30
[ 242.843481] ? metaslab_class_throttle_reserve+0xc3/0xe0 [zfs]
[ 242.845247] ? zio_io_to_allocate+0x60/0x80 [zfs]
[ 242.846659] zio_execute+0x81/0x120 [zfs]
[ 242.847864] taskq_thread+0x2da/0x520 [spl]
[ 242.849115] ? wake_up_q+0xa0/0xa0
[ 242.850367] ? zio_destroy+0xf0/0xf0 [zfs]
[ 242.851759] ? taskq_thread_spawn+0x50/0x50 [spl]
[ 242.852801] kthread+0x11b/0x140
[ 242.853513] ? __kthread_bind_mask+0x60/0x60
[ 242.854447] ret_from_fork+0x22/0x30
[ 242.855235] INFO: task txg_sync:283 blocked for more than 120 seconds.
[ 242.857674] Tainted: P OE 5.10.0-14-amd64 #1 Debian 5.10.113-1
[ 242.859281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.860949] task:txg_sync state:D stack: 0 pid: 283 ppid: 2 flags:0x00004000
[ 242.862637] Call Trace:
[ 242.863105] __schedule+0x282/0x870
[ 242.863826] schedule+0x46/0xb0
[ 242.864409] schedule_timeout+0x8b/0x140
[ 242.865538] ? __next_timer_interrupt+0x110/0x110
[ 242.866431] io_schedule_timeout+0x4c/0x80
[ 242.867213] __cv_timedwait_common+0x12b/0x160 [spl]
[ 242.868192] ? add_wait_queue_exclusive+0x70/0x70
[ 242.869155] __cv_timedwait_io+0x15/0x20 [spl]
[ 242.870039] zio_wait+0x129/0x2b0 [zfs]
[ 242.870799] dsl_pool_sync+0x461/0x4f0 [zfs]
[ 242.871655] spa_sync+0x575/0xfa0 [zfs]
[ 242.872430] ? mutex_lock+0xe/0x30
[ 242.873927] ? spa_txg_history_init_io+0x101/0x110 [zfs]
[ 242.875014] txg_sync_thread+0x2e0/0x4a0 [zfs]
[ 242.875964] ? txg_fini+0x240/0x240 [zfs]
[ 242.876809] thread_generic_wrapper+0x6f/0x80 [spl]
[ 242.877766] ? __thread_exit+0x20/0x20 [spl]
[ 242.878724] kthread+0x11b/0x140
[ 242.879398] ? __kthread_bind_mask+0x60/0x60
[ 242.880266] ret_from_fork+0x22/0x30
[ 260.101414] random: crng init done
The hrmpf rescue BootCD (zfs-2.1.2-1)
[ 103.775614] PANIC: zfs: adding existent segment to range tree (offset=76bab1000 size=12000)
[ 103.778085] Showing stack for process 1201
[ 103.779296] CPU: 0 PID: 1201 Comm: z_wr_iss Tainted: P O 5.15.11_1 #1
[ 103.780288] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 103.780288] Call Trace:
[ 103.780288] <TASK>
[ 103.780288] dump_stack_lvl+0x46/0x5a
[ 103.780288] vcmn_err.cold+0x50/0x68 [spl]
[ 103.780288] ? kmem_cache_alloc+0x280/0x3c0
[ 103.780288] ? metaslab_rangesize64_compare+0x40/0x40 [zfs]
[ 103.780288] ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs]
[ 103.780288] ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs]
[ 103.780288] ? zfs_btree_add_idx+0xb0/0x220 [zfs]
[ 103.780288] ? zfs_btree_find+0x175/0x300 [zfs]
[ 103.780288] zfs_panic_recover+0x6d/0x90 [zfs]
[ 103.780288] range_tree_add_impl+0x305/0xe40 [zfs]
[ 103.780288] ? __schedule+0x1195/0x1480
[ 103.780288] ? range_tree_remove_impl+0xf00/0xf00 [zfs]
[ 103.780288] range_tree_walk+0xad/0x1e0 [zfs]
[ 103.780288] metaslab_load+0x34c/0x8a0 [zfs]
[ 103.780288] ? range_tree_add_impl+0x754/0xe40 [zfs]
[ 103.780288] metaslab_activate+0x4c/0x280 [zfs]
[ 103.780288] ? metaslab_set_selected_txg+0x7f/0xc0 [zfs]
[ 103.780288] metaslab_alloc_dva+0x2b6/0x1490 [zfs]
[ 103.780288] metaslab_alloc+0xcf/0x280 [zfs]
[ 103.780288] zio_dva_allocate+0xd4/0x8d0 [zfs]
[ 103.780288] ? __kmalloc_node+0x397/0x480
[ 103.780288] ? spl_kmem_alloc_impl+0xae/0xf0 [spl]
[ 103.780288] ? zio_io_to_allocate+0x63/0x80 [zfs]
[ 103.780288] zio_execute+0x81/0x120 [zfs]
[ 103.780288] taskq_thread+0x2cb/0x500 [spl]
[ 103.780288] ? wake_up_q+0x90/0x90
[ 103.780288] ? zio_gang_tree_free+0x60/0x60 [zfs]
[ 103.780288] ? taskq_thread_spawn+0x50/0x50 [spl]
[ 103.780288] kthread+0x127/0x150
[ 103.780288] ? set_kthread_struct+0x40/0x40
[ 103.780288] ret_from_fork+0x22/0x30
[ 103.780288] </TASK>
Thank you!
System information
Tried on another system
Describe the problem you're observing
I have a (no redundancy) pool which have been corrupted by a HW failure. Trying to import it causes a PANIC and the zpool import process hangs in "D uninterruptible sleep (usually IO)" state:
Maybe issue #13445 is related, there is a similar backtrace there.
The pool can be imported with
zpool import -o readonly=true -f rpoolor withzpool import -f -T 2676127 rpool.Describe how to reproduce the problem
I have a dump / image of the pool in the corrupted state on which I can repeatedly reproduce this with both system / ZFS version.
I can (and willing to) try out possible solutions, too. (The original pool have been recovered with the
-T txgmethod.)Unfortunately I can not share the whole image as it contains personal information, some short hexdump may be possible.
Include any warning/errors/backtraces from the system logs
The results have been reproduced running on qemu/kvm (version 5.2.0) using the image as a virtual disks, the original hypervisor was an ESXi.
The original system (Debian, zfs-2.0.3-9)
The hrmpf rescue BootCD (zfs-2.1.2-1)
Thank you!