Skip to content

Importing corrupted pool causes PANIC: zfs: adding existent segment to range tree #13483

@raron

Description

@raron

System information

Type Version/Name
Distribution Name Debian
Distribution Version bullseye (11)
Kernel Version 5.10.0-14
Architecture amd64
OpenZFS Version zfs-2.0.3-9 zfs-kmod-2.0.3-9

Tried on another system

Type Version/Name
Distribution Name hrmpf rescue system / Void Linux
Distribution Version 20211227
Kernel Version 5.15.11_1
Architecture amd64
OpenZFS Version zfs-2.1.2-1 zfs-kmod-2.1.2-1

Describe the problem you're observing

I have a (no redundancy) pool which have been corrupted by a HW failure. Trying to import it causes a PANIC and the zpool import process hangs in "D uninterruptible sleep (usually IO)" state:

PANIC: zfs: adding existent segment to range tree

Maybe issue #13445 is related, there is a similar backtrace there.

The pool can be imported with zpool import -o readonly=true -f rpool or with zpool import -f -T 2676127 rpool.

Describe how to reproduce the problem

I have a dump / image of the pool in the corrupted state on which I can repeatedly reproduce this with both system / ZFS version.
I can (and willing to) try out possible solutions, too. (The original pool have been recovered with the -T txg method.)

Unfortunately I can not share the whole image as it contains personal information, some short hexdump may be possible.

Include any warning/errors/backtraces from the system logs

The results have been reproduced running on qemu/kvm (version 5.2.0) using the image as a virtual disks, the original hypervisor was an ESXi.

The original system (Debian, zfs-2.0.3-9)

[   65.022435] PANIC: zfs: adding existent segment to range tree (offset=76bab1000 size=12000) 
[   65.024094] Showing stack for process 208 
[   65.024915] CPU: 0 PID: 208 Comm: z_wr_iss Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1 
[   65.026795] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 
[   65.028896] Call Trace: 
[   65.028896]  dump_stack+0x6b/0x83 
[   65.028896]  vcmn_err.cold+0x58/0x80 [spl] 
[   65.028896]  ? metaslab_rangesize64_compare+0x40/0x40 [zfs] 
[   65.028896]  ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs] 
[   65.028896]  ? zfs_btree_add_idx+0xd1/0x210 [zfs] 
[   65.028896]  ? zfs_btree_find+0x175/0x300 [zfs] 
[   65.028896]  zfs_panic_recover+0x6d/0x90 [zfs] 
[   65.028896]  range_tree_add_impl+0x305/0xe40 [zfs] 
[   65.028896]  ? range_tree_remove_impl+0xf10/0xf10 [zfs] 
[   65.028896]  range_tree_walk+0xad/0x1e0 [zfs] 
[   65.028896]  metaslab_load+0x359/0x8b0 [zfs] 
[   65.028896]  metaslab_activate+0x4c/0x220 [zfs] 
[   65.028896]  ? metaslab_set_selected_txg+0x7f/0xc0 [zfs] 
[   65.028896]  metaslab_alloc_dva+0x134/0x1210 [zfs] 
[   65.028896]  metaslab_alloc+0xbe/0x250 [zfs] 
[   65.028896]  zio_dva_allocate+0xd4/0x800 [zfs] 
[   65.028896]  ? _cond_resched+0x16/0x40 
[   65.028896]  ? mutex_lock+0xe/0x30 
[   65.028896]  ? metaslab_class_throttle_reserve+0xc3/0xe0 [zfs] 
[   65.028896]  ? zio_io_to_allocate+0x60/0x80 [zfs] 
[   65.028896]  zio_execute+0x81/0x120 [zfs] 
[   65.028896]  taskq_thread+0x2da/0x520 [spl] 
[   65.028896]  ? wake_up_q+0xa0/0xa0 
[   65.028896]  ? zio_destroy+0xf0/0xf0 [zfs] 
[   65.028896]  ? taskq_thread_spawn+0x50/0x50 [spl] 
[   65.028896]  kthread+0x11b/0x140 
[   65.028896]  ? __kthread_bind_mask+0x60/0x60 
[   65.028896]  ret_from_fork+0x22/0x30 

[  242.724547] INFO: task zpool:148 blocked for more than 120 seconds. 
[  242.727882]       Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1 
[  242.732106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  242.736450] task:zpool           state:D stack:    0 pid:  148 ppid:   131 flags:0x00004002 
[  242.741604] Call Trace: 
[  242.742809]  __schedule+0x282/0x870 
[  242.744808]  schedule+0x46/0xb0 
[  242.745984]  io_schedule+0x42/0x70 
[  242.747203]  cv_wait_common+0xac/0x130 [spl] 
[  242.748743]  ? add_wait_queue_exclusive+0x70/0x70 
[  242.750497]  txg_wait_synced_impl+0xc9/0x110 [zfs] 
[  242.752330]  txg_wait_synced+0xc/0x40 [zfs] 
[  242.754256]  spa_config_update+0x3f/0x170 [zfs] 
[  242.755667]  spa_import+0x5e0/0x840 [zfs] 
[  242.757574]  zfs_ioc_pool_import+0x12f/0x150 [zfs] 
[  242.759000]  zfsdev_ioctl_common+0x697/0x870 [zfs] 
[  242.760111]  ? _copy_from_user+0x28/0x60 
[  242.761065]  zfsdev_ioctl+0x53/0xe0 [zfs] 
[  242.761988]  __x64_sys_ioctl+0x83/0xb0 
[  242.762849]  do_syscall_64+0x33/0x80 
[  242.763672]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 
[  242.764840] RIP: 0033:0x7f6863364cc7 
[  242.765652] RSP: 002b:00007ffe22a5aa28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 
[  242.767341] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6863364cc7 
[  242.769220] RDX: 00007ffe22a5aaa0 RSI: 0000000000005a02 RDI: 0000000000000003 
[  242.771302] RBP: 00007ffe22a5e990 R08: 0000000000000000 R09: 00007f686342ebe0 
[  242.775328] R10: 0000000010000000 R11: 0000000000000246 R12: 0000555e380bf320 
[  242.778226] R13: 00007ffe22a5aaa0 R14: 00007f685c001970 R15: 0000000000000000 
[  242.782019] INFO: task z_wr_iss:208 blocked for more than 120 seconds. 
[  242.785910]       Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1 
[  242.790663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  242.794223] task:z_wr_iss        state:D stack:    0 pid:  208 ppid:     2 flags:0x00004000 
[  242.798323] Call Trace: 
[  242.799648]  __schedule+0x282/0x870 
[  242.801901]  schedule+0x46/0xb0 
[  242.803602]  vcmn_err.cold+0x7e/0x80 [spl] 
[  242.805956]  ? metaslab_rangesize64_compare+0x40/0x40 [zfs] 
[  242.809675]  ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs] 
[  242.812331]  ? zfs_btree_add_idx+0xd1/0x210 [zfs] 
[  242.814956]  ? zfs_btree_find+0x175/0x300 [zfs] 
[  242.818007]  zfs_panic_recover+0x6d/0x90 [zfs] 
[  242.820627]  range_tree_add_impl+0x305/0xe40 [zfs] 
[  242.824200]  ? range_tree_remove_impl+0xf10/0xf10 [zfs] 
[  242.826735]  range_tree_walk+0xad/0x1e0 [zfs] 
[  242.828848]  metaslab_load+0x359/0x8b0 [zfs] 
[  242.830868]  metaslab_activate+0x4c/0x220 [zfs] 
[  242.832916]  ? metaslab_set_selected_txg+0x7f/0xc0 [zfs] 
[  242.835249]  metaslab_alloc_dva+0x134/0x1210 [zfs] 
[  242.837181]  metaslab_alloc+0xbe/0x250 [zfs] 
[  242.838851]  zio_dva_allocate+0xd4/0x800 [zfs] 
[  242.841201]  ? _cond_resched+0x16/0x40 
[  242.842367]  ? mutex_lock+0xe/0x30 
[  242.843481]  ? metaslab_class_throttle_reserve+0xc3/0xe0 [zfs] 
[  242.845247]  ? zio_io_to_allocate+0x60/0x80 [zfs] 
[  242.846659]  zio_execute+0x81/0x120 [zfs] 
[  242.847864]  taskq_thread+0x2da/0x520 [spl] 
[  242.849115]  ? wake_up_q+0xa0/0xa0 
[  242.850367]  ? zio_destroy+0xf0/0xf0 [zfs] 
[  242.851759]  ? taskq_thread_spawn+0x50/0x50 [spl] 
[  242.852801]  kthread+0x11b/0x140 
[  242.853513]  ? __kthread_bind_mask+0x60/0x60 
[  242.854447]  ret_from_fork+0x22/0x30 
[  242.855235] INFO: task txg_sync:283 blocked for more than 120 seconds. 
[  242.857674]       Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1 
[  242.859281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  242.860949] task:txg_sync        state:D stack:    0 pid:  283 ppid:     2 flags:0x00004000 
[  242.862637] Call Trace: 
[  242.863105]  __schedule+0x282/0x870 
[  242.863826]  schedule+0x46/0xb0 
[  242.864409]  schedule_timeout+0x8b/0x140 
[  242.865538]  ? __next_timer_interrupt+0x110/0x110 
[  242.866431]  io_schedule_timeout+0x4c/0x80 
[  242.867213]  __cv_timedwait_common+0x12b/0x160 [spl] 
[  242.868192]  ? add_wait_queue_exclusive+0x70/0x70 
[  242.869155]  __cv_timedwait_io+0x15/0x20 [spl] 
[  242.870039]  zio_wait+0x129/0x2b0 [zfs] 
[  242.870799]  dsl_pool_sync+0x461/0x4f0 [zfs] 
[  242.871655]  spa_sync+0x575/0xfa0 [zfs] 
[  242.872430]  ? mutex_lock+0xe/0x30 
[  242.873927]  ? spa_txg_history_init_io+0x101/0x110 [zfs] 
[  242.875014]  txg_sync_thread+0x2e0/0x4a0 [zfs] 
[  242.875964]  ? txg_fini+0x240/0x240 [zfs] 
[  242.876809]  thread_generic_wrapper+0x6f/0x80 [spl] 
[  242.877766]  ? __thread_exit+0x20/0x20 [spl] 
[  242.878724]  kthread+0x11b/0x140 
[  242.879398]  ? __kthread_bind_mask+0x60/0x60 
[  242.880266]  ret_from_fork+0x22/0x30 
[  260.101414] random: crng init done 

The hrmpf rescue BootCD (zfs-2.1.2-1)

[  103.775614] PANIC: zfs: adding existent segment to range tree (offset=76bab1000 size=12000) 
[  103.778085] Showing stack for process 1201 
[  103.779296] CPU: 0 PID: 1201 Comm: z_wr_iss Tainted: P           O      5.15.11_1 #1 
[  103.780288] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 
[  103.780288] Call Trace: 
[  103.780288]  <TASK> 
[  103.780288]  dump_stack_lvl+0x46/0x5a 
[  103.780288]  vcmn_err.cold+0x50/0x68 [spl] 
[  103.780288]  ? kmem_cache_alloc+0x280/0x3c0 
[  103.780288]  ? metaslab_rangesize64_compare+0x40/0x40 [zfs] 
[  103.780288]  ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs] 
[  103.780288]  ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs] 
[  103.780288]  ? zfs_btree_add_idx+0xb0/0x220 [zfs] 
[  103.780288]  ? zfs_btree_find+0x175/0x300 [zfs] 
[  103.780288]  zfs_panic_recover+0x6d/0x90 [zfs] 
[  103.780288]  range_tree_add_impl+0x305/0xe40 [zfs] 
[  103.780288]  ? __schedule+0x1195/0x1480 
[  103.780288]  ? range_tree_remove_impl+0xf00/0xf00 [zfs] 
[  103.780288]  range_tree_walk+0xad/0x1e0 [zfs] 
[  103.780288]  metaslab_load+0x34c/0x8a0 [zfs] 
[  103.780288]  ? range_tree_add_impl+0x754/0xe40 [zfs] 
[  103.780288]  metaslab_activate+0x4c/0x280 [zfs] 
[  103.780288]  ? metaslab_set_selected_txg+0x7f/0xc0 [zfs] 
[  103.780288]  metaslab_alloc_dva+0x2b6/0x1490 [zfs] 
[  103.780288]  metaslab_alloc+0xcf/0x280 [zfs] 
[  103.780288]  zio_dva_allocate+0xd4/0x8d0 [zfs] 
[  103.780288]  ? __kmalloc_node+0x397/0x480 
[  103.780288]  ? spl_kmem_alloc_impl+0xae/0xf0 [spl] 
[  103.780288]  ? zio_io_to_allocate+0x63/0x80 [zfs] 
[  103.780288]  zio_execute+0x81/0x120 [zfs] 
[  103.780288]  taskq_thread+0x2cb/0x500 [spl] 
[  103.780288]  ? wake_up_q+0x90/0x90 
[  103.780288]  ? zio_gang_tree_free+0x60/0x60 [zfs] 
[  103.780288]  ? taskq_thread_spawn+0x50/0x50 [spl] 
[  103.780288]  kthread+0x127/0x150 
[  103.780288]  ? set_kthread_struct+0x40/0x40 
[  103.780288]  ret_from_fork+0x22/0x30 
[  103.780288]  </TASK> 

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions