Skip to content

SingleBockProviderFulu with soft and hard limit for sampled blocks#10533

Open
gfukushima wants to merge 13 commits intoConsensys:masterfrom
gfukushima:single-block-provider-fulu
Open

SingleBockProviderFulu with soft and hard limit for sampled blocks#10533
gfukushima wants to merge 13 commits intoConsensys:masterfrom
gfukushima:single-block-provider-fulu

Conversation

@gfukushima
Copy link
Copy Markdown
Contributor

@gfukushima gfukushima commented Mar 26, 2026

PR Description

Fixed Issue(s)

Part of #10208

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

Note

Medium Risk
Touches sync/DAS sampling coordination and block lookup during initialization; mistakes could cause missed block fetches or premature tracker eviction under load, though changes are mostly additive and bounded by new limits.

Overview
Improves Fulu-era data-availability sampling integration by refactoring DasSamplerBasic into an interface and moving the implementation to new DasSamplerBasicImpl, adding bounded tracker storage with soft eviction of completed entries and a hard cap to prevent unbounded growth, plus a new gauge (das_recently_sampled_blocks_size).

Wires the sampler into sync/block fetching: RecentBlocksFetchService now avoids fetching blocks already tracked by the DAS sampler, and DefaultSyncServiceFactory/BeaconChainController pass the sampler through (defaulting to DasSamplerBasic.NOOP) while introducing a SingleBlockProviderResolver used by StorageBackedRecentChainData initialization.

Adds configuration for the tracker bound via SyncConfig#getMaxRecentlySampledBlocks and hidden CLI flag --Xp2p-max-recently-sampled-blocks, and updates tests/reference executors to construct DasSamplerBasicImpl with metrics and the new limit.

Reviewed by Cursor Bugbot for commit ba1926a. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
@gfukushima gfukushima marked this pull request as ready for review March 26, 2026 09:06
Comment thread beacon/sync/src/main/java/tech/pegasys/teku/beacon/sync/SyncConfig.java Outdated
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>

@Override
public Optional<SignedBeaconBlock> getBlock(final Bytes32 blockRoot) {
return blockProvider.getBlock(blockRoot).or(() -> blockProviderFallback.getBlock(blockRoot));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SingleBlockProviderResolver DAS check is effectively a no-op

Medium Severity

The SingleBlockProviderResolver always produces the same result as the old code. The blockProvider returns the block from the pool only if dasSamplerBasic.containsBlock is true, but the blockProviderFallback unconditionally returns the block from the same pool. Since getBlock chains them with .or(), the DAS sampler check never actually gates anything — every block in the pool is always returned regardless.

Additional Locations (1)
Fix in Cursor Fix in Web

Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
(blockRoot) ->
dasSamplerBasic.containsBlock(blockRoot)
? blockBlobSidecarsTrackersPool.getBlock(blockRoot)
: Optional.empty());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SingleBlockProviderResolver DAS check is effectively a no-op

Medium Severity

The SingleBlockProviderResolver has a primary provider that gates on dasSamplerBasic.containsBlock(blockRoot) and returns blockBlobSidecarsTrackersPool.getBlock(blockRoot) only if the sampler knows the block. However, the fallback provider unconditionally returns blockBlobSidecarsTrackersPool.getBlock(blockRoot) from the exact same pool. Since the resolver does blockProvider.getBlock(blockRoot).or(() -> blockProviderFallback.getBlock(blockRoot)), when the primary returns Optional.empty() (sampler doesn't contain the block), the fallback always provides the block anyway. The containsBlock check provides zero gating effect — the result is always identical to calling blockBlobSidecarsTrackersPool.getBlock(blockRoot) directly.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfukushima is it something to be fixed in the next PRs?

…t we could get we the defualt config. TLDR: DEFAULT_MAX_RECENTLY_SAMPLED_BLOCKS=DEFAULT_FORWARD_SYNC_BATCH_SIZE *

DEFAULT_FORWARD_SYNC_MAX_PENDING_BATCHES; // 125

Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Copy link
Copy Markdown
Contributor

@zilm13 zilm13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all these changes required especially with loosing concurrency safety traded to synchronization. Maybe just adjust makeNewRoomForTrackers in the old code according to new requirements? I guess we don't care much about blocks removed in removeAllForBlock because those are imported/rejected.

// We already have this block, waiting for blobs
return;
}
if (dasBasicSampler.containsBlock(blockRoot)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we put containsBlock in BlockEventsListener interface and check it both for dataColumnSidecars and blobSidecars in router? Maybe add an extra interface for clarity or rename this one but it looks like we are doing the same job here, isn't it?

this.recentChainData = recentChainData;
this.halfColumnsSamplingCompletionEnabled = halfColumnsSamplingCompletionEnabled;
this.maxRecentlySampledBlocks = maxRecentlySampledBlocks;
this.recentlySampledColumnsByRoot = new LinkedHashMap<>(maxRecentlySampledBlocks);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not concurrent safe

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chosen so can make use of the fifo during eviction, that's why I added some synchronism when this Map is modified

"das_recently_sampled_blocks_size",
"DAS recently sampled blocks size",
() -> {
synchronized (this) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are adding synchronization to this method. Is it really necessary? We are trying to avoid adding synchronization whenever it's possible in new classes and update old code because synchronization is slow and could be a reason for locks/stalling. Its a step backwards.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, in this case we want to make sure that the LinkedHashMap is atomically handled, for the metrics i don't think this would be extremelly necessary but also we want metrics that provide accurate data don't we?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean in general. It's just a first synchronization block of several. If we are going with concurrent safe setup without locks we could (rarely) get not contains and we will just redownload it again. It's not a big trade-off compared to locking everything here I think.
Maybe also check it in recentChainData in the end of checks inRecentBlocksFetchService maybe it's already there? Does it make sense? It should be fast to check there.

.values()
.removeIf(
tracker -> {
if (tracker.slot().isLessThan(firstNonFinalizedSlot)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were several useful comments here, i'd restore it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored

Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
@gfukushima gfukushima requested a review from a team as a code owner April 17, 2026 02:31
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit ba1926a. Configure here.

"das_recently_sampled_blocks_size",
"DAS recently sampled blocks size",
recentlySampledColumnsByRoot::size
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsynchronized gauge reads non-thread-safe LinkedHashMap

Low Severity

The metrics gauge callback recentlySampledColumnsByRoot::size is invoked from the metrics-scraping thread without any synchronization. The map was changed from a thread-safe ConcurrentHashMap to a plain LinkedHashMap, where all other accesses are protected by synchronized (this). This gauge callback bypasses that synchronization, creating a data race on the non-thread-safe LinkedHashMap.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ba1926a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants