Skip to content

Autopilot airgap update stuck in "schedulablewait" on controller+worker (hybrid) nodes #7303

@emosbaugh

Description

@emosbaugh

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

$ uname -srvmo; cat /etc/os-release || lsb_release -a
Linux 5.15.161 #1 SMP Thu Nov 27 23:23:17 UTC 2025 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.35.1+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 7.8 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 43.4 GiB (pass)
Relative disk space available for /var/lib/k0s: 88% (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.15.161 (pass)
  Max. file descriptors per process: current: 1048575 / max: 1048576 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: built-in (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: built-in (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: built-in (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: built-in (pass)
      CONFIG_NF_NAT: built-in (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connection tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: built-in (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: built-in (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: built-in (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: built-in (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: built-in (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: built-in (pass)
      CONFIG_NF_DEFRAG_IPV4: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connection tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: unknown (warning)
        CONFIG_IP6_NF_FILTER: Packet filtering: unknown (warning)
        CONFIG_IP6_NF_MANGLE: Packet mangling: unknown (warning)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: unknown (warning)
      CONFIG_NF_DEFRAG_IPV6: built-in (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: built-in (pass)
      CONFIG_LLC: built-in (pass)
      CONFIG_STP: built-in (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Since k0s v1.35, autopilot airgap update plans targeting hybrid controller+worker nodes get permanently stuck in the schedulablewait state and never complete.

Steps to reproduce

  1. Start a k0s cluster with at least one hybrid controller+worker node (k0s controller --enable-worker ...)
  2. Create an autopilot Plan with an airgapupdate command that targets the hybrid node as a worker:
apiVersion: autopilot.k0sproject.io/v1beta2
kind: Plan
metadata:
  name: autopilot
spec:
  id: id123
  timestamp: now
  commands:
    - airgapupdate:
        version: v1.35.2+k0s.0
        platforms:
          linux-amd64:
            url: http://<server>/bundle.tar
        workers:
          discovery:
            static:
              nodes:
                - controller0   # hybrid controller+worker node
  1. Observe the plan never completes

Expected behavior

The airgap update plan should reach PlanCompleted.

Actual behavior

The plan loops in schedulablewait indefinitely, repeating every ~5 seconds:

time="2026-03-13 05:23:32" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
time="2026-03-13 05:23:32" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait

Screenshots and logs

$ sudo k0s kubectl get plans -oyaml
apiVersion: v1
items:
- apiVersion: autopilot.k0sproject.io/v1beta2
  kind: Plan
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autopilot.k0sproject.io/v1beta2","kind":"Plan","metadata":{"annotations":{},"name":"autopilot"},"spec":{"commands":[{"airgapupdate":{"platforms":{"linux-amd64":{"url":"http://localhost:8081/bundle.tar"}},"version":"v1.35.2+k0s.0","workers":{"discovery":{"static":{"nodes":["0818fb90"]}}}}}],"id":"repro-1773768268","timestamp":"2026-03-17T17:24:28Z"}}
    creationTimestamp: "2026-03-17T17:24:28Z"
    generation: 1
    name: autopilot
    resourceVersion: "940"
    uid: 1d164b94-77fe-4b3f-b6a7-56856d0b8edf
  spec:
    commands:
    - airgapupdate:
        platforms:
          linux-amd64:
            url: http://localhost:8081/bundle.tar
        version: v1.35.2+k0s.0
        workers:
          discovery:
            static:
              nodes:
              - 0818fb90
          limits:
            concurrent: 1
    id: repro-1773768268
    timestamp: "2026-03-17T17:24:28Z"
  status:
    commands:
    - airgapupdate:
        workers:
        - lastUpdatedTimestamp: "2026-03-17T17:24:28Z"
          name: 0818fb90
          state: SignalSent
      id: 0
      state: SchedulableWait
    state: SchedulableWait
kind: List
metadata:
  resourceVersion: ""
$ tail -n 20 controller.log 
Mar 17 17:32:28 0818fb90 k0s[1554]: time="2026-03-17 17:32:28" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:28 0818fb90 k0s[1554]: time="2026-03-17 17:32:28" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:28 0818fb90 k0s[1554]: time="2026-03-17 17:32:28" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:28 0818fb90 k0s[1554]: time="2026-03-17 17:32:28" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true
Mar 17 17:32:33 0818fb90 k0s[1554]: time="2026-03-17 17:32:33" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:33 0818fb90 k0s[1554]: time="2026-03-17 17:32:33" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:33 0818fb90 k0s[1554]: time="2026-03-17 17:32:33" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:33 0818fb90 k0s[1554]: time="2026-03-17 17:32:33" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true
Mar 17 17:32:38 0818fb90 k0s[1554]: time="2026-03-17 17:32:38" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:38 0818fb90 k0s[1554]: time="2026-03-17 17:32:38" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:38 0818fb90 k0s[1554]: time="2026-03-17 17:32:38" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:38 0818fb90 k0s[1554]: time="2026-03-17 17:32:38" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true
Mar 17 17:32:43 0818fb90 k0s[1554]: time="2026-03-17 17:32:43" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:43 0818fb90 k0s[1554]: time="2026-03-17 17:32:43" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:43 0818fb90 k0s[1554]: time="2026-03-17 17:32:43" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:43 0818fb90 k0s[1554]: time="2026-03-17 17:32:43" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true
Mar 17 17:32:48 0818fb90 k0s[1554]: time="2026-03-17 17:32:48" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:48 0818fb90 k0s[1554]: time="2026-03-17 17:32:48" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:48 0818fb90 k0s[1554]: time="2026-03-17 17:32:48" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
Mar 17 17:32:48 0818fb90 k0s[1554]: time="2026-03-17 17:32:48" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true

Additional context

Root Cause

PR #6849 ("Only start Autopilot worker component on pure worker nodes") introduced two related changes:

  1. cmd/worker/worker_unix.go — The autopilot worker component is no longer started on hybrid nodes:

    // Before:
    if !workerConfig.AutopilotDisabled {
    // After:
    if !workerConfig.AutopilotDisabled && controller == nil {
  2. pkg/autopilot/controller/plans/cmdprovider/k0supdate/newplan.go — Hybrid nodes are filtered out of the workers list for k0supdate plans (correct: the controller autopilot component handles the k0s binary update via the ControlNode signal object).

However, pkg/autopilot/controller/plans/cmdprovider/airgapupdate/provider.go was not updated with the equivalent filter. Hybrid nodes are still included in the airgap workers list, but since their autopilot worker component is no longer running (change 1), the airgap signal written to the v1.Node annotation is never consumed.

The resulting state machine deadlock:

  • Plan transitions PendingSignalSignalSent (signal written to Node annotation)
  • schedulablewait sees pendingSignalCount == 0, signalingSentCount == 1, so canScheduleWorkers = false and IsCompleted = false
  • Plan loops in schedulablewait forever

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions