Spec update for callbacks on ConfigProvider to support runtime changes by jackshirazi · Pull Request #4900 · open-telemetry/opentelemetry-specification

jackshirazi · 2026-02-23T16:21:59Z

Changes

This PR extends specification/configuration/api.md to define a language-neutral
ConfigProvider change-listener contract for runtime declarative configuration updates.

Spec updates include:

adding Add config change listener as a required ConfigProvider operation
defining watched-path requirements (absolute declarative path, exact-match semantics)
defining callback payload semantics (path + updated ConfigProperties)
clarifying empty/unset behavior (newConfig is a valid instance representing an empty mapping node when unset/cleared)
defining delivery semantics (coalescing allowed, ordering unspecified)
defining lifecycle/concurrency behavior (idempotent close, post-close behavior, concurrency expectations)
defining error/unsupported-provider behavior (listener failure isolation, no-op registration when notifications are unsupported)

Related issues #, Making "methods" instrumentation dynamically updateable opentelemetry-java-instrumentation#15228
Related OTEP(s) [OTEP] Telemetry Policy #4738
Links to the prototypes (when adding or changing features): Add a ConfigProvider callback for runtime instrumentation option changes opentelemetry-java#8076
CHANGELOG.md file updated for non-trivial changes
Spec compliance matrix updated if necessary

trask · 2026-02-25T20:32:03Z

+* API implementations SHOULD document accepted path syntax in language-specific
+  docs and include examples such as `.instrumentation/development.general.http`
+  and `.instrumentation/development.java.methods`.


these could probably be standard across languages

worth noting whether traversing through arrays is supported

I gave a standard and language specific example, I'm fine with different examples.

I've added a line (just before this) about arrays, thanks!

oh, I meant about

implementations SHOULD document accepted path syntax

were you thinking that, e.g. java might use .instrumentation/development.general.http path syntax, while another might use something else, e.g. instrumentation/development->general->http?

yes, okay I see what you meant. I see what you mean the whole thing should be standardized - is it standardized in declarative config across languages? If so then yes, let's specify standard path syntax accordingly

@open-telemetry/configuration-approvers what do you think? thanks

I think it would be good to standardize on something like JSONPath:

https://en.wikipedia.org/wiki/JSONPath

https://www.rfc-editor.org/rfc/rfc9535

Or maybe some sort of abbreviated / subset of the syntax which achieves the goal while keeping the implementation burden reasonable.

I think this is not specified in declarative config? So can't be specified here? Or are we proposing to specify it here?

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

jack-berg · 2026-02-27T19:13:30Z

+* `newConfig` MUST be a valid [`ConfigProperties`](#configproperties) instance
+  (never null/nil/None).
+* If the watched node is unset or cleared, `newConfig` MUST represent an empty
+  mapping node (equivalent to `{}`).


Set and empty vs. unset turns out to be semantically meaningful in declarative config:

# this is valid tracer_provider: - processors: simple: exporter: console: --- # this is invalid tracer_provider: - processors: simple: exporter:

I think we need to find some way signal this difference to watchers.

Good point. I agree we should preserve declarative-config semantics. Do you think adding a DeclarativeConfigProperties.unset() (or missing()) is a good option, though it adds more API changes? Or add a constant ConfigChangeListener.UNSET which provides the exact situation, leaving the callback to make the check?

jack-berg · 2026-02-27T19:15:11Z

+  operations.
+* Implementations MUST document callback concurrency guarantees. If they do not,
+  users MUST assume callbacks may be invoked concurrently.
+* Closing a registration handle MUST unregister the listener.


Need to indicate that a callback is required to have a close operation before specifying behavior for a close operation.

Good call. I've updated the return comment to specify that, commit 5b0d4d7

jack-berg · 2026-02-27T19:19:07Z

+
+* If callback execution throws an exception, implementations SHOULD isolate the
+  failure to that callback and SHOULD continue notifying other callbacks.
+* If a provider does not support change notifications, registration MUST still


This is defining the "noop" behavior of this operation. Elsewhere in the spec we have extracted dedicated noop documents (e.g. metrics noop). It may be time to do the same for the declarative config API.

I assume this is a callout for the declarative config API, not for this doc?

jack-berg · 2026-02-27T19:20:31Z

 The `ConfigProvider` MUST provide the following functions:

 * [Get instrumentation config](#get-instrumentation-config)
+* [Add config change listener](#add-config-change-listener)


#nit: could change listener be referring to something other than config? If not, consider dropping.

Suggested change

* [Add config change listener](#add-config-change-listener)

* [Add change listener](#add-config-change-listener)

applied in commit 9371403

jack-berg · 2026-02-27T19:26:23Z

As declarative config integrates more tightly into the otel java agent, and as we start looking towards dynamic config solutions like #4738, I think a capability to allow instrumentation to respond to changes in config is essential.

Based on #4889, only PHP and Java have implemented the ConfigProvider API. So curious if @Nevay / @brettmc have identified any need for this.

As for other declarative config implementers, @codeboten, @MikeGoldsmith, @maryliag, @Kielek, @ysolomchenko, @marcalff, even if you haven't implemented ConfigProvider API yet, does this use case listening for config changes resonate with you?

MikeGoldsmith · 2026-03-03T12:02:45Z

A way to watch a config and automatically reload would be welcome to remove the need to restart a service to pick up new changes. I don't think it would be a hard requirement though.

Kielek · 2026-03-03T12:51:27Z

@jack-berg, the hot reload functionality sounds great, but it should not be marked as required functionality. It should be up to the technology to decide if it can be implemented or no.

I suppose that also partial support can be considered with returned information to configuration provide (OpAMP?) that some settings cannot be applied without process restart.

pellared · 2026-03-03T15:12:28Z

Is there any prototype for this?

pellared · 2026-03-03T15:15:20Z

@jack-berg, the hot reload functionality sounds great, but it should not be marked as required functionality. It should be up to the technology to decide if it can be implemented or no.

I suppose that also partial support can be considered with returned information to configuration provide (OpAMP?) that some settings cannot be applied without process restart.

What is more, making (especially everything) "hot reload" will make the SDK less efficient because of required additional synchronization.

In my opinion, the prototype should include extensive benchmarks. I am worried that this is going to add more synchronization on the hot path.

jack-berg · 2026-03-03T19:09:31Z

What is more, making (especially everything) "hot reload" will make the SDK less efficient because of required additional synchronization.

The hot reload proposed here is limited only to ConfigProvider, the API portion of declarative config which instrumentations use for configuration. So its not on the hot path of the internals of the SDK, but the synchornization would still on the hot path for each individual instrumentation. I.e. if an http instrumentation supports dynamic config, it would have to synchronize the logic that determines if / which HTTP request / response headers to capture (amongst other things).

This convo reminds me of the convo #4645. I initially pushed back, favoring eventual visibility without guarantees for performance reasons, but was ultimately convinced that an additional .8ns per record operation was low enough overhead to not worry. I believe the same level of synchronization and overhead would occur here as a result of instrumentation config changing.

Is there any prototype for this?

@jackshirazi has been sketching out the API here. Notably, there is no SDK implementation, nor proposed SDK spec here. I think that needs to change.

@jack-berg, the hot reload functionality sounds great, but it should not be marked as required functionality.

Yeah we should talk about this. Besides the potential performance overhead from runtime changes to instrumentation config, there's also the additional complexity required. Even if every language supported the ability to watch for changes, we can't force every instrumentation to call those watch APIs (although we could encourage, similar to how we don't force semantic conventions but encourage). What does it mean for the UX if only some instrumentation is written to be responsive to runtime changes?

Co-authored-by: Jack Berg <34418638+jack-berg@users.noreply.github.com>

jackshirazi · 2026-03-04T14:53:19Z

Runtime changes are for few and select components. The TelemetryPolicy that this aligns to does not at all expect reload of all components, nor even that they be enabled to do so. The intention is that IF some component is enabled to handle runtime changes, THEN there is a mechanism for it to receive those changes. For Java, there are maybe a dozen components that will be implemented to adapt to runtime configuration changes, and at the moment only one instrumentation that is proposed to adapt to runtime changes. This is very targeted.

Only the components that are interested in runtime config changes will add a callback for the path that they are interested in, this is always likely to be a small set
The TelemetryPolicy pipeline that handles runtime config changes will only accept changes that are configured to be implemented
For an SDK, these are expected to be rare events (you only occasionally reconfigure the agent, eg for the most common example of changing sampling rate, you might change it at most a few times over the day)
The nature of config changes are that they are not expected to be applied instantaneously, especially since the main impetus is for a remote central config to provide changes. An eventually consistent approach is fine
The biggest overhead that I can see is where there is a mismatch at the level between the path that is being registered for a callback and the path that is used to make a config change. If we specify the path to be a standardized string path with dot separators per the example, it becomes a substring match which eliminates that overhead

github-actions · 2026-03-19T04:02:45Z

This PR was marked stale. It will be closed in 14 days without additional activity.

jack-berg · 2026-03-19T15:09:02Z

@pellared - thoughts on @jackshirazi's response?

@jackshirazi please respond to the other active threads if you plan on continuing working on this. Thanks!

jackshirazi · 2026-03-19T17:17:05Z

I can get back to this next week

reyang · 2026-03-25T14:44:39Z

+* Implementations MAY coalesce rapid successive updates for the same watched
+  path. If coalescing is performed, callback delivery MUST use the latest
+  configuration state.
+* Ordering of callback delivery is not specified, including for updates touching


Trying to understand what this means.

If I make a change "foo=a", then I make another change "foo=b" - is it possible that from the callback I will get "foo=b" first, then later I'll get "foo=a"?

Yes. Especially if those changes are concurrent. I would expect changes to generally be occasional events rather than many close together, so mostly this shouldn't matter, but if there are changes made close together, this doesn't insist on ordering (which could be a pain to implement in some langauges)

Got it, thanks @jackshirazi!

@jack-berg WDYT?

I can imagine the following options:

We don't guarantee ordering, and there is no way for clients to reliably determine if it is getting the latest configuration or it is using some old/stale version due to race condition.

We don't send a portion of configuration snapshot to the callback, instead, we just notify the listener "there are some changes which you might be interested", then we expect the listener to go and check the configuration.

In addition to the existing arguments that we pass to the listener callback, we also put something like a sequence number. In the original example, "foo=a" would have sequence number = 1, and "foo=" would have sequence number = 2, then the listener can decide to drop the late arrival notification if the sequence number is smaller than what the listener already got.

reyang · 2026-03-25T14:46:34Z

+
+Concurrency and lifecycle requirements:
+
+* Callback implementations SHOULD be reentrant and SHOULD avoid blocking


Trying to understand the thinking behind this - would the configuration component create new threads / execution context for reentrant calls?

Or keep them quick. But the point here is to not force anything on the component, this is telling the component to handle multiple calls as best it can to be a "nice" citizen so the callback isn't expected to add additional overhead to try and handle components

Quick doesn't have a direct relationship to threading/concurrency model.
We can make it quick and sequential.

I guess my main question is - why do we want this to be reentrant? Reentrancy is always more difficult, could be slightly more difficult or significantly more difficult. I want to understand what's the gain/loss by having or not having reentrancy.

Quick kind of does. If it's quick you can use an exclusive block to do the update and not worry that it's causing problems, which makes the concurrency handling simple.

But it's a SHOULD rather than a MUST. It keeps the change listener implementation simpler. There are likely to be few instrumentations or components that will be adapted to handle callbacks, and most that do are likely to be able to make a simple state update that applies when the instrumentation/component is next applied. So with that expectation, the simpler change listener seems reasonable

It keeps the change listener implementation simpler.

Sorry I'm confused. I thought it'll be simpler for the listener if we say "callback will only be invoked sequentially, there is no need for the listener to worry about reentrancy or concurrency". Are we on the same page?

The simplest change listener implementation is to respond directly to a change in the config and send that directly to the callback. This could be on any thread

Something on any thread -> synchronously changes the config on a path -> synchronously checks for any callbacks on that path -> synchronously does the callback -> instrumentation/component callback implementation handles the callback .

So the change listener here doesn't worry about reentrancy or concurrency and is very simple, and it's all happening on the "any thread" thread. The instrumentation/component callback implementation DOES need to worry about reentrancy and concurrency because there can be more than one "Something on different threads" initiating that. This change listener would document it could execute on any thread and could be calling a change implementation concurrently

A "nicer" but more complex change listener implementation would add every change into a queue, and have a dedicated thread process the queue and apply each callback sequentially. That would document that, and in this case instrumentation/component callback implementations can potentially be simpler (assuming they had additional complexity if they were handling concurrency and re-entrancy).

reyang · 2026-03-25T14:48:15Z

+* Implementations MUST document callback concurrency guarantees. If they do not,
+  users MUST assume callbacks may be invoked concurrently.


Who are the users and how would they assume? (trying to understand if this is actionable or not)

The user is the implementor of the integration that integrates the ConfigProvider to register a listener, ie mostly instrumentation/component authors. So when that instrumentation or component is now adapted to register for callbacks, it understands what to expect. Eg "callbacks are serialized on one thread" would be nice for the instrumentation/component authors making their job easier, otherwise they have to assume concurrent callbacks which is a pain

Thanks! I think now I understand your intention better, trying to rephrase and confirm my understanding:

SDK authors MUST document the reentrancy expectations (not guarantees) for listener callbacks.

The instrumentation/component authors MUST handle reentrancy properly (do not support at all, partially support, or fully support), based on the expectations set for the SDK which they are targeting. If there is no clear expectation set by the SDK authors, the instrumentation/component authors MUST support reentrant callbacks.

update from feedback

jackshirazi · 2026-03-25T17:31:24Z

Notably, there is no SDK implementation, nor proposed SDK spec here. I think that needs to change.

@jack-berg I can start working on an SDK implementation, I wasn't sure we had reached that stage. I'll work against the proposed API

github-actions · 2026-04-09T04:04:55Z

This PR was marked stale. It will be closed in 14 days without additional activity.

robsunday · 2026-04-09T12:52:15Z

+
+Path requirements:
+
+* `path` MUST be an absolute declarative configuration path.


I think it would be good to explicitly specify if multiple listeners are allowed for the same path.

Thanks, good point. Added as the first point in the next Callback requirements

spec for callbacks on ConfigProvider to support runtume changes

a5acd3a

jackshirazi requested review from a team as code owners February 23, 2026 16:22

jackshirazi mentioned this pull request Feb 23, 2026

Add a ConfigProvider callback for runtime instrumentation option changes open-telemetry/opentelemetry-java#8076

Open

jack-berg self-assigned this Feb 25, 2026

trask reviewed Feb 25, 2026

View reviewed changes

Update specification/configuration/api.md

70d429d

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

jackshirazi commented Feb 25, 2026

View reviewed changes

Comment thread specification/configuration/api.md

jackshirazi and others added 2 commits February 25, 2026 23:10

Apply suggestion from @jackshirazi

d5911b6

Update specification/configuration/api.md

6fe3a42

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

jack-berg reviewed Feb 27, 2026

View reviewed changes

Comment thread specification/configuration/api.md Outdated

jack-berg reviewed Feb 27, 2026

View reviewed changes

Update specification/configuration/api.md

f9665b4

Co-authored-by: Jack Berg <34418638+jack-berg@users.noreply.github.com>

github-actions bot added the Stale label Mar 19, 2026

github-actions bot removed the Stale label Mar 20, 2026

Merge branch 'main' into config-provider-callback

69a8b63

reyang reviewed Mar 25, 2026

View reviewed changes

jackshirazi commented Mar 25, 2026

View reviewed changes

Comment thread specification/configuration/api.md Outdated

jackshirazi and others added 2 commits March 25, 2026 14:52

Apply suggestion from @jackshirazi

5b0d4d7

update from feedback

jack-berg feedback

9371403

github-actions bot added the Stale label Apr 9, 2026

robsunday reviewed Apr 9, 2026

View reviewed changes

github-actions bot removed the Stale label Apr 10, 2026

	* [Add config change listener](#add-config-change-listener)
	* [Add change listener](#add-config-change-listener)


		Concurrency and lifecycle requirements:

		* Callback implementations SHOULD be reentrant and SHOULD avoid blocking

		* Implementations MUST document callback concurrency guarantees. If they do not,
		users MUST assume callbacks may be invoked concurrently.


		Path requirements:

		* `path` MUST be an absolute declarative configuration path.

Conversation

jackshirazi commented Feb 23, 2026

Changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jack-berg commented Feb 27, 2026

Uh oh!

MikeGoldsmith commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kielek commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pellared commented Mar 3, 2026

Uh oh!

pellared commented Mar 3, 2026

Uh oh!

jack-berg commented Mar 3, 2026

Uh oh!

jackshirazi commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

jack-berg commented Mar 19, 2026

Uh oh!

jackshirazi commented Mar 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reyang Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeGoldsmith commented Mar 3, 2026 •

edited

Loading

Kielek commented Mar 3, 2026 •

edited

Loading

reyang Mar 25, 2026 •

edited

Loading