Skip to content

Entities: information from Dash0 on how we solved some merging issues with with out entities-like system #5021

@mmanciop

Description

@mmanciop

What are you trying to achieve?

Provide some additional context on lessons learned in Dash0 with implementing a system very similar to Entities, especially on how to reconcile conflicting information in semantic convention namespaces

Context

I joined a recent call of the Entities SIG to figure out what the plans are wrt merging Entities with potentially conflicting information in a backend. I was asked to post in an issue info on how we solved similar problems in Dash0. The following is a snippet from an internal PRD that defines rules about how to merge and not to merge resource information across resources that share identifying information, like k8s.pod.uid, but that differ in significant details because, e.g., different SDKs in different processes, different agents collecting telemetry at different levels of the infrastructure, etc.

The rules

Rule 0: Do not merge conflicting namespaces or subnamespaces

If the resource R_1 of a piece of telemetry contains a key . with value A, if another resource R_2 with the same dash0.resource.id [Note: this is effectively an entity identifier] contains . with value B, no keys prefixed by . will be imported from R_2 into R_1.

Note: Conflicts disqualify the entire top-level namespace. For example, if the resource R_1 has process.runtime.name=java and R_2 has process.runtime.name=nodejs, the value of no key starting with process., like process.pid, will be copied to R_1.

Note: Some specific namespaces in the OpenTelemetry semantic conventions for resources may have special rules in the remainder.

Rule 1: Uncertainty principle

If the resource hash R_1 of a piece of telemetry contains no key from the namespace, and resource R_2 and R_3 with the same dash0.resource.id contain conflicting values for the namespace, we should treat R_1 as having EITHER the entirety of the values of R_1 OR from R_2, but not mixed.

The intuition behind this rule is “if we are not sure which service it is, it can be any of the known ones in this resource”.

Note: Some specific namespaces in the OpenTelemetry semantic conventions for resources may have special rules in the remainder.

Rule 2: Subordination to telemetry.*

A conflict in telemetry.* namespace, which is set by OpenTelemetry SDKs, implies a conflict in the process.* namespace.
The rationale is the following: each SDK implements the telemetry.sdk attributes out of the box, but not all implement the process.* attributes out of the box. The lack of support for the process.* attributes in an SDK would cause the leaking of the process.* attributes from another SDK in, say, the same pod. Instead, we assume that the process contains up to one OpenTelemetry SDK, and that allows us to prevent the leaking of the process.* attributes.

Rule 3: Subordination to container.* and k8s.container.*

A conflict in the container.* namespace or the k8s.container.* namespace implies a conflict in the telemetry.* namespace, k8s.container.* and process.* namespace. That is, if we know it’s a different container, we also know it cannot be the same process and, thus, the same OTel SDK instance.

Rule 4: Subordination to os.*

A conflict in the os.* namespace implies a conflict in the telemetry.* namespace, process.* namespace and container.* namespace. That is, if we know it’s a different operating system, we also know it cannot be the same process and container instance and, thus, the same OTel SDK instance.

Note: It could be the case that the OS reported by one resource is the host’s, and the other the container userland. This rule may lead to some false negatives, but given the fact that os.* is, in our experience, seldom reported outside of Host-related monitoring, it seems a fair bet.

Rule 5: Subordination to system.*

A conflict in the system.* namespace implies a conflict in the os.* namespace, telemetry.* namespace, process.* namespace and container.* namespace. That is, if we know it’s a different operating system, we also know it cannot be the same process and container instance and, thus, the same OTel SDK instance.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Phase 1: Resource <-> Entity Mapping

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions