fix(sdk): resolve exporter deadlock on constrained tokio runtimes#3380
Open
bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
Open
fix(sdk): resolve exporter deadlock on constrained tokio runtimes#3380bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
d5e1ec9 to
4c2b288
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3380 +/- ##
======================================
Coverage 83.2% 83.3%
======================================
Files 128 128
Lines 25045 25164 +119
======================================
+ Hits 20858 20965 +107
- Misses 4187 4199 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The default thread-based processors (BatchSpanProcessor, BatchLogProcessor, PeriodicReader) call futures_executor::block_on() on their dedicated worker threads. When the exporter uses tonic/gRPC, the export future depends on tokio tasks (e.g. tonic's Buffer worker) that can only be polled by tokio worker threads. If all tokio worker threads are blocked (single-threaded runtime, or multi-thread with 1 worker), this creates a circular wait. Add BlockingStrategy that captures the tokio runtime handle at construction time and enters the runtime context via Handle::enter() before calling futures_executor::block_on(). This makes tokio types available on the dedicated background threads without taking ownership of the reactor. Falls back to plain futures_executor::block_on() without tokio. Fixes: open-telemetry#2802
Add tests with TokioSpawn*Exporter mocks that call tokio::spawn() inside export(), simulating tonic/gRPC exporters. These prove that BlockingStrategy correctly provides tokio runtime context on the processor's dedicated OS thread, preventing deadlocks on constrained multi_thread(1) runtimes (open-telemetry#2802, open-telemetry#3356).
c3840f7 to
ed5ff32
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BlockingStrategythat captures the tokio runtime handle at construction and enters the runtime context viaHandle::enter()before callingfutures_executor::block_on()on dedicated background threadsBatchSpanProcessor,BatchLogProcessor, andPeriodicReaderto useBlockingStrategyinstead of barefutures_executor::block_on()futures_executor::block_on()when no tokio runtime is availableProblem
The default thread-based processors call
futures_executor::block_on(exporter.export(...))on their dedicated worker threads. When the exporter uses tonic/gRPC, the export future depends on tokio tasks (e.g. tonic'sBufferworker spawned viatokio::spawn) that can only be polled by tokio worker threads. If all tokio worker threads are blocked — single-threaded runtime (current_thread), ormulti_threadwith 1 worker (common in 1-vCPU k8s pods) — this creates a circular wait: the worker thread waits for the export to complete, but the export can't complete because no tokio thread is available to poll the Buffer worker.Reproduction and detailed analysis in #3356 (comment): #3356 (comment)
Minimal repro gist: https://gist.github.com/bryantbiggs/62737e105525fe341090d0ad97de2178
force_flush()shutdown()current_threadErr(Timeout(5s)), worker thread stays stuckmulti_thread(1)+tokio::spawnmulti_thread(default workers)Solution
BlockingStrategy::new()is called during processor construction (while inside the tokio runtime context). It callsHandle::try_current()to capture the runtime handle. On the dedicated background thread,blocking_strategy.block_on(future)enters the runtime context viaHandle::enter()before callingfutures_executor::block_on(). This makes tokio types (spawn, timers, IO) available without taking ownership of the reactor — IO continues to be driven by the runtime's own threads.When no tokio runtime is available (e.g., non-tokio environments), it falls back to plain
futures_executor::block_on()— preserving existing behavior.Scope
This is the scoped-down version of #3356, containing only the bug fix as suggested by @scottgerring in #3356 (comment). The experimental async runtime removal is intentionally excluded and can be discussed separately.
Fixes #2802
Test plan
cargo check -p opentelemetry_sdk --all-featurescargo check -p opentelemetry_sdk --no-default-featurescargo clippy -p opentelemetry_sdk --all-features -- -Dwarningscargo test -p opentelemetry_sdk --features="testing"— 295 passed, 0 failed, 3 ignored (pre-existing)cargo check -p opentelemetry-otlp --all-features