Skip to content

[BUG] serverless-init: concurrent map panic at process exit on Cloud Run Jobs (Go 1.26) #48308

@robbiet480

Description

@robbiet480

Agent version

serverless-init-1.9.7 (digest sha256:321ad57120279db986f1358a60b97485f0afb75255cd6e2f287e66640f7fda43, built 2026-03-23)

Bug Report

serverless-init crashes with fatal error: concurrent map read and map write after the user process exits on Cloud Run Jobs. The panic is in the trace flush pipeline — the user workload completes successfully (return_code:0) but Cloud Run marks the task failed due to the non-zero exit.

Reproduction Steps

  1. Use serverless-init as entrypoint on a Cloud Run Job
  2. User process exits normally
  3. During post-exit trace flush, CloudRunJobsSpanModifier.ModifySpanUpgradeTraceIDGetMeta (reads span.Meta) races with serviceKeyCatalog.register (writes catalog map)

Full stack:

fatal error: concurrent map read and map write

goroutine 126 [running]:
internal/runtime/maps.fatal({0x1fd2053?, 0x6?})
        /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.8.linux-amd64/src/runtime/panic.go:1046 +0x18
github.com/DataDog/datadog-agent/pkg/trace/traceutil.GetMeta(...)
        /tmp/dd/datadog-agent/pkg/trace/traceutil/span.go:158
github.com/DataDog/datadog-agent/pkg/trace/traceutil.GetTraceIDHigh(...)
        /tmp/dd/datadog-agent/pkg/trace/traceutil/span.go:218
github.com/DataDog/datadog-agent/pkg/trace/traceutil.HasTraceIDHigh(...)
        /tmp/dd/datadog-agent/pkg/trace/traceutil/span.go:229
github.com/DataDog/datadog-agent/pkg/trace/traceutil.UpgradeTraceID(0xc0001f22a0, 0xc000754700)
        /tmp/dd/datadog-agent/pkg/trace/traceutil/span.go:243 +0x5a
github.com/DataDog/datadog-agent/cmd/serverless-init/trace.(*CloudRunJobsSpanModifier).ModifySpan(0xc0005806c0, 0xc00001bdd0?, 0xc000754700)
        /tmp/dd/datadog-agent/cmd/serverless-init/trace/span_modifier.go:71 +0x2a5
github.com/DataDog/datadog-agent/pkg/trace/agent.(*Agent).Process(0xc0000f0900, 0xc0002e3130)
        /tmp/dd/datadog-agent/pkg/trace/agent/agent.go:502 +0xbe4
github.com/DataDog/datadog-agent/pkg/trace/agent.(*Agent).work(0xc0000f0900)
        /tmp/dd/datadog-agent/pkg/trace/agent/agent.go:310 +0x5b
created by github.com/DataDog/datadog-agent/pkg/trace/agent.(*Agent).Run in goroutine 81
        /tmp/dd/datadog-agent/pkg/trace/agent/agent.go:257 +0x30e

goroutine 1 [running]:
  serviceKeyCatalog.register  (map write — truncated in Cloud Run log output)

This is not the ModifySpan field race fixed in #44727 — that fix is present. The race is between two separate unsynchronized maps accessed concurrently during flush:

Goroutine Operation Map
1 serviceKeyCatalog.register catalog map (write)
126 traceutil.GetMeta via ModifySpan span.Meta (read)

Go 1.25 switched to Swiss tables (internal/runtime/maps) which calls fatal() more aggressively on concurrent access than Go ≤1.23, surfacing a latent race. serverless-init itself is built with Go 1.24, but our user binary is Go 1.26 — the panic originates entirely within serverless-init's own goroutines.

Agent configuration

N/A (serverless-init default config, DD_SERVERLESS_FLUSH_STRATEGY=end, DD_BIND_HOST=127.0.0.1)

Operating System

linux/amd64 (Alpine-based Cloud Run container)

Other environment details

Google Cloud Run Jobs, user binary built with Go 1.26


This issue was drafted by Claude (Anthropic) and reviewed by the submitter before filing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions