Skip to content

Parallel Batch Fetch for Chain-Indexer Catch-up Phase 2 #1049

@cosmir17

Description

@cosmir17

Background

Suggested by Giles in DM during PR #1038 review (16 April).

After PR #1038 (hybrid catch-up: forward-by-height for older blocks, backward-by-hash for last 400 blocks near tip), the backward phase still does serial fetches. Each block requires:

  1. block_at(parent_hash).await — RPC to fetch the block
  2. block_header(&parent).await — RPC to fetch the header (for next parent_hash)

That's 2 RPCs per block, run sequentially because each step needs the previous block's parent_hash. Then make_block in the forward yield phase fetches each block again. Total: ~4 RPCs per Phase 2 block, all serial.

Giles's principle: "It's not enough to be async, things have to be able to happen in parallel."

Reference: midnight-node PR #1263

midnight-node#1263 (merged) added batch block-hash fetching in the toolkit. Key change in util/toolkit/src/fetcher/fetch_task.rs:

let hashes: Vec<H256> = client
    .rpc_client
    .request("chain_getBlockHash", rpc_params![block_numbers])
    .await?;

Substrate's chain_getBlockHash JSON-RPC method natively supports passing an array of block numbers and returns an array of hashes in one round-trip.

Current Phase 2 implementation

In chain-indexer/src/infra/subxt_node.rs (after PR #1038):

let mut hashes = Vec::with_capacity(FINALIZATION_SAFETY_MARGIN as usize);
let mut parent_hash = first_block.header().parent_hash;
while parent_hash != stop_hash && parent_hash != genesis_parent_hash {
    let parent = self.block_at(parent_hash).await?;       // RPC 1
    parent_hash = block_header(&parent).await?.parent_hash; // RPC 2
    hashes.push(parent.block_hash());
}

for hash in hashes.into_iter().rev() {
    let block = self.block_at(hash).await?;                // RPC 3
    yield self.make_block(&mut authorities, block).await?; // RPC 4+ (header + transactions etc.)
}

For 400 blocks: ~1600 serial RPCs.

Proposed approach

  1. One batch RPC for all hashes:

    • Compute the height range: (end_height - FINALIZATION_SAFETY_MARGIN)..end_height
    • Call chain_getBlockHash([h1, h2, ..., h400]) → returns Vec<H256> in one round-trip
  2. Parallel block fetches with throttling:

    • Use futures::stream::iter(hashes).map(|hash| fetch).buffered(N) or similar
    • Throttle to a reasonable concurrency (e.g. 8-16) to avoid overwhelming the node
  3. Verification with hash-based fallback:

    • After fetching, verify backwards: block_N.parent_hash == hash_at_height_N_minus_1 (from batch)
    • If verification fails for a specific block (rare), fall back to the proven by-hash fetch chain for that block onwards

Subxt 0.50 raw RPC pattern

use subxt::rpcs::rpc_params;

let block_numbers: Vec<u64> = (start..end).collect();
let hashes: Vec<H256> = self
    .rpc_client
    .request("chain_getBlockHash", rpc_params![block_numbers])
    .await
    .map_err(|error| SubxtNodeError::BatchHashFetch(error.into()))?;

self.rpc_client is ReconnectingRpcClient which is already in SubxtNode.

Files to modify

  • chain-indexer/src/infra/subxt_node.rs
    • Add fetch_block_hashes_batch(start: u64, end: u64) -> Vec<H256> helper
    • Replace Phase 2 backward walk with batch fetch + parallel block fetches + verification
    • Add BatchHashFetch error variant
  • Possibly chain-indexer/src/infra/subxt_node/parallel.rs for parallel fetch utilities (if extracted)

Edge cases

  • Empty range (no Phase 2 blocks): skip batch RPC, go straight to yielding first_block
  • Verification failure: log warn, fall back to hash-based walk for the failing block onwards
  • Concurrency limit tuning: start conservative (8?), benchmark before increasing

Tests

  • Unit tests for the batch hash fetch helper (mock RPC)
  • Integration test verifying Phase 2 still produces correct blocks under parallel fetch
  • Stress test (optional) for throughput improvement

GH issue title

perf(chain-indexer): use batch chain_getBlockHash and parallel fetch in catch-up Phase 2

GH issue body

Phase 2 of the catch-up flow (backward-by-hash for the last FINALIZATION_SAFETY_MARGIN blocks near the finalized tip) currently does ~1600 serial RPCs for a full 400-block backward walk.

Following the pattern in midnight-node#1263, we can:

  1. Batch all block-hash lookups into one chain_getBlockHash([N-1, N-2, ..., N-400]) RPC
  2. Fetch the blocks in parallel with throttled concurrency
  3. Verify each block's parent_hash matches the previous block's hash from the batch (sanity check)
  4. Fall back to by-hash fetching for any block that fails verification

This complements the application-level subscription quota work and the WAF rate limiting (midnight-security#85 / shielded-sre#142) by reducing RPC pressure on the node during catch-up.

Related:

Estimated effort

Medium. New batch RPC pattern, parallel fetch logic, verification with fallback, tests.

Priority

Lower than wallet sync work (per Giles 16 April: "speeding up wallet syncing is priority 1"). Pick up after:

  • midnight-indexer#1048 (dustGenerationMerkleTreeUpdate query)
  • Umbrella tracking issue in midnight-security

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions