Skip to content

feat: adopt storage-core gc-v1 to reclaim ledger_db_nodes disk space #1047

@cosmir17

Description

@cosmir17

Context

storage-core 1.2.0 (used by ledger 8.1) introduces the gc-v1 feature, which enables incremental mark-and-sweep garbage collection of Sp nodes. The node (1.0.0-rc.2) already enables both layout-v2 and gc-v1. The indexer currently only uses layout-v2, so ledger_db_nodes grows unboundedly as historical state becomes unreachable.

Thomas Kerber confirmed in Slack that this is not exclusive to ParityDB and the indexer should adopt it: https://shielded.slack.com/archives/C080DP0F58U/p1776335501713349?thread_ts=1776254200.830079&cid=C080DP0F58U (gc-v1 thread, 16 Apr)

Design (guidance from Thomas)

  1. Persist pattern: persist the latest ledger_state_key as a GC root, unpersist old ones as chain-indexer advances blocks, let gc() clean orphaned Sp nodes.
  2. Invocation: small time-bounded gc() calls in chain-indexer (e.g. every block), rather than a large scheduled sweep.
  3. Time bound guidance:
    • Must exceed full in-memory cache traversal time
    • Must be significantly greater than a single DB read/write
    • Must be large/frequent enough for GC to actually make progress (hard to measure precisely)
    • Thomas's internal test reference: 500ms per block (he called this overly conservative)
    • Tune based on observed cache size and production behaviour

Implementation scope

  • Enable gc-v1 feature on midnight-storage-core in Cargo.toml
  • Implement DB::scan and DB::ScanResumeHandle on SQL-backed LedgerDb (indexer-common/src/infra/ledger_db/v1_1.rs) with paginated SELECT key, object FROM ledger_db_nodes (cursor-based resume)
  • Call persist(latest_ledger_state_key) + unpersist(previous_ledger_state_key) at block advance in chain-indexer
  • Periodic gc(bound) invocation in chain-indexer (between blocks), with time bound configurable via env var
  • Metrics: gc_runs_total, gc_nodes_culled_total, gc_duration_seconds

Out of scope

  • Whitelist-based GC (gc_override_gc_roots) — unpersist/persist pattern is simpler
  • Historical ledger state queries (indexer doesn't expose these)

Testing

  • Unit test for scan() on Postgres + SQLite backends
  • Integration test: index blocks, verify ledger_db_nodes shrinks after GC
  • Perf test: ensure GC does not cause chain-indexer to fall behind

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions