fix(target-allocator): sort labels in processTargetGroups to prevent hash collisions#4967
Draft
gracewehner wants to merge 4 commits intoopen-telemetry:mainfrom
Draft
fix(target-allocator): sort labels in processTargetGroups to prevent hash collisions#4967gracewehner wants to merge 4 commits intoopen-telemetry:mainfrom
gracewehner wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
… fix silent target loss
processTargetGroups used a ScratchBuilder struct-copy optimization that
produced labels with two independently sorted sublists (group labels +
target labels) but not globally sorted. labels.Labels.Get() uses binary
search assuming globally sorted data. When group labels sort after target
labels (e.g. 'vendor' > '__address__'), Get('__address__') silently returns
empty string, causing relabel rules reading source_labels: [__address__] to
produce empty results.
Both targets then get identical post-relabel labels, creating hash collisions
in the allocator's target map which silently drops one target.
The fix materializes group labels once, then creates a fresh ScratchBuilder
per target and calls Sort() after all labels are added, ensuring globally
sorted Labels that work correctly with binary search.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename groupBuilder to targetBuilder via pointer alias in the per-target loop to clarify that the builder is being reused for target label merging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hash collisions ScratchBuilder.Labels() serializes labels in insertion order without sorting. When group labels (e.g. vendor=nginx) sort after target labels (e.g. __address__), Labels.Get() hits a sorted-order early-termination check and returns empty for __address__. This causes relabel rules using source_labels: [__address__] to produce identical output for all targets in the group, creating hash collisions that silently drop targets in the allocator's buildTargetMap. Fix: replace the struct-copy pattern with a merge-sort that interleaves sorted group labels and sorted target labels, keeping output sorted. Uses ScratchBuilder.Overwrite for zero-allocation after the first iteration (~0.3% benchmark overhead vs buggy baseline). Add regression test and workaround validation tests confirming: - The fix produces sorted labels and unique hashes - The buggy code collapses targets via hash collision - Workaround: removing group labels avoids the collision - One-target-per-group does NOT avoid the collision Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When
processTargetGroupsbuilds labels for each target, it uses a struct-copy of the groupScratchBuilderand appends target labels. The resulting label set has two independently sorted sublists (group labels, then target labels) but is not globally sorted.labels.Labels.Get()(in thestringlabelsbuild, the default) uses a linear scan with sorted-order early termination: if it encounters a label name whose first byte is greater than the search key, it stops immediately and returns empty. When a group label likevendor(first bytev=118) precedes__address__(first byte_=95),Get("__address__")sees 118 > 95 and breaks early — returning "".This causes
relabel_configsthat readsource_labels: [__address__]to produce identical output for all targets in the group, creating hash collisions in the allocator'smap[ItemHash]*Item. The last target silently overwrites all others.Fix
Replace the struct-copy pattern with a merge-sort that interleaves the sorted group labels and sorted target labels into globally sorted output. Uses
ScratchBuilder.Overwrite()to reuse internal buffers, achieving ~0% overhead vs the buggy baseline in benchmarks.Testing
TestSortedLabelsBlackboxRelabelingthat reproduces the exact customer scenario (blackbox exporter withvendorgroup label + 2 targets)make precommitclean — fmt, vet, lint, 1955 tests, ensure-update-is-noop)Customer Impact
This bug silently drops scrape targets when:
static_configshas group-level labels AND multiple targetsrelabel_configsreads__address__(which most configs do)__address__(any label starting with a-z)Workaround: Remove group labels from
static_configsand add them viametric_relabel_configsinstead.Link to tracking issue
Fixes open-telemetry/opentelemetry-operator#XXXX