SingleBockProviderFulu with soft and hard limit for sampled blocks#10533
SingleBockProviderFulu with soft and hard limit for sampled blocks#10533gfukushima wants to merge 13 commits intoConsensys:masterfrom
Conversation
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
|
|
||
| @Override | ||
| public Optional<SignedBeaconBlock> getBlock(final Bytes32 blockRoot) { | ||
| return blockProvider.getBlock(blockRoot).or(() -> blockProviderFallback.getBlock(blockRoot)); |
There was a problem hiding this comment.
SingleBlockProviderResolver DAS check is effectively a no-op
Medium Severity
The SingleBlockProviderResolver always produces the same result as the old code. The blockProvider returns the block from the pool only if dasSamplerBasic.containsBlock is true, but the blockProviderFallback unconditionally returns the block from the same pool. Since getBlock chains them with .or(), the DAS sampler check never actually gates anything — every block in the pool is always returned regardless.
Additional Locations (1)
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
| (blockRoot) -> | ||
| dasSamplerBasic.containsBlock(blockRoot) | ||
| ? blockBlobSidecarsTrackersPool.getBlock(blockRoot) | ||
| : Optional.empty()); |
There was a problem hiding this comment.
SingleBlockProviderResolver DAS check is effectively a no-op
Medium Severity
The SingleBlockProviderResolver has a primary provider that gates on dasSamplerBasic.containsBlock(blockRoot) and returns blockBlobSidecarsTrackersPool.getBlock(blockRoot) only if the sampler knows the block. However, the fallback provider unconditionally returns blockBlobSidecarsTrackersPool.getBlock(blockRoot) from the exact same pool. Since the resolver does blockProvider.getBlock(blockRoot).or(() -> blockProviderFallback.getBlock(blockRoot)), when the primary returns Optional.empty() (sampler doesn't contain the block), the fallback always provides the block anyway. The containsBlock check provides zero gating effect — the result is always identical to calling blockBlobSidecarsTrackersPool.getBlock(blockRoot) directly.
Additional Locations (1)
There was a problem hiding this comment.
@gfukushima is it something to be fixed in the next PRs?
…t we could get we the defualt config. TLDR: DEFAULT_MAX_RECENTLY_SAMPLED_BLOCKS=DEFAULT_FORWARD_SYNC_BATCH_SIZE * DEFAULT_FORWARD_SYNC_MAX_PENDING_BATCHES; // 125 Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
Signed-off-by: Gabriel Fukushima <gabrielfukushima@gmail.com>
zilm13
left a comment
There was a problem hiding this comment.
I'm not sure all these changes required especially with loosing concurrency safety traded to synchronization. Maybe just adjust makeNewRoomForTrackers in the old code according to new requirements? I guess we don't care much about blocks removed in removeAllForBlock because those are imported/rejected.
| // We already have this block, waiting for blobs | ||
| return; | ||
| } | ||
| if (dasBasicSampler.containsBlock(blockRoot)) { |
There was a problem hiding this comment.
Shouldn't we put containsBlock in BlockEventsListener interface and check it both for dataColumnSidecars and blobSidecars in router? Maybe add an extra interface for clarity or rename this one but it looks like we are doing the same job here, isn't it?
| this.recentChainData = recentChainData; | ||
| this.halfColumnsSamplingCompletionEnabled = halfColumnsSamplingCompletionEnabled; | ||
| this.maxRecentlySampledBlocks = maxRecentlySampledBlocks; | ||
| this.recentlySampledColumnsByRoot = new LinkedHashMap<>(maxRecentlySampledBlocks); |
There was a problem hiding this comment.
chosen so can make use of the fifo during eviction, that's why I added some synchronism when this Map is modified
| "das_recently_sampled_blocks_size", | ||
| "DAS recently sampled blocks size", | ||
| () -> { | ||
| synchronized (this) { |
There was a problem hiding this comment.
You are adding synchronization to this method. Is it really necessary? We are trying to avoid adding synchronization whenever it's possible in new classes and update old code because synchronization is slow and could be a reason for locks/stalling. Its a step backwards.
There was a problem hiding this comment.
well, in this case we want to make sure that the LinkedHashMap is atomically handled, for the metrics i don't think this would be extremelly necessary but also we want metrics that provide accurate data don't we?
There was a problem hiding this comment.
I mean in general. It's just a first synchronization block of several. If we are going with concurrent safe setup without locks we could (rarely) get not contains and we will just redownload it again. It's not a big trade-off compared to locking everything here I think.
Maybe also check it in recentChainData in the end of checks inRecentBlocksFetchService maybe it's already there? Does it make sense? It should be fast to check there.
| .values() | ||
| .removeIf( | ||
| tracker -> { | ||
| if (tracker.slot().isLessThan(firstNonFinalizedSlot) |
There was a problem hiding this comment.
there were several useful comments here, i'd restore it
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
Reviewed by Cursor Bugbot for commit ba1926a. Configure here.
| "das_recently_sampled_blocks_size", | ||
| "DAS recently sampled blocks size", | ||
| recentlySampledColumnsByRoot::size | ||
| ); |
There was a problem hiding this comment.
Unsynchronized gauge reads non-thread-safe LinkedHashMap
Low Severity
The metrics gauge callback recentlySampledColumnsByRoot::size is invoked from the metrics-scraping thread without any synchronization. The map was changed from a thread-safe ConcurrentHashMap to a plain LinkedHashMap, where all other accesses are protected by synchronized (this). This gauge callback bypasses that synchronization, creating a data race on the non-thread-safe LinkedHashMap.
Reviewed by Cursor Bugbot for commit ba1926a. Configure here.


PR Description
Fixed Issue(s)
Part of #10208
Documentation
doc-change-requiredlabel to this PR if updates are required.Changelog
Note
Medium Risk
Touches sync/DAS sampling coordination and block lookup during initialization; mistakes could cause missed block fetches or premature tracker eviction under load, though changes are mostly additive and bounded by new limits.
Overview
Improves Fulu-era data-availability sampling integration by refactoring
DasSamplerBasicinto an interface and moving the implementation to newDasSamplerBasicImpl, adding bounded tracker storage with soft eviction of completed entries and a hard cap to prevent unbounded growth, plus a new gauge (das_recently_sampled_blocks_size).Wires the sampler into sync/block fetching:
RecentBlocksFetchServicenow avoids fetching blocks already tracked by the DAS sampler, andDefaultSyncServiceFactory/BeaconChainControllerpass the sampler through (defaulting toDasSamplerBasic.NOOP) while introducing aSingleBlockProviderResolverused byStorageBackedRecentChainDatainitialization.Adds configuration for the tracker bound via
SyncConfig#getMaxRecentlySampledBlocksand hidden CLI flag--Xp2p-max-recently-sampled-blocks, and updates tests/reference executors to constructDasSamplerBasicImplwith metrics and the new limit.Reviewed by Cursor Bugbot for commit ba1926a. Bugbot is set up for automated code reviews on this repo. Configure here.