Context
storage-core 1.2.0 (used by ledger 8.1) introduces the gc-v1 feature, which enables incremental mark-and-sweep garbage collection of Sp nodes. The node (1.0.0-rc.2) already enables both layout-v2 and gc-v1. The indexer currently only uses layout-v2, so ledger_db_nodes grows unboundedly as historical state becomes unreachable.
Thomas Kerber confirmed in Slack that this is not exclusive to ParityDB and the indexer should adopt it: https://shielded.slack.com/archives/C080DP0F58U/p1776335501713349?thread_ts=1776254200.830079&cid=C080DP0F58U (gc-v1 thread, 16 Apr)
Design (guidance from Thomas)
- Persist pattern: persist the latest
ledger_state_key as a GC root, unpersist old ones as chain-indexer advances blocks, let gc() clean orphaned Sp nodes.
- Invocation: small time-bounded
gc() calls in chain-indexer (e.g. every block), rather than a large scheduled sweep.
- Time bound guidance:
- Must exceed full in-memory cache traversal time
- Must be significantly greater than a single DB read/write
- Must be large/frequent enough for GC to actually make progress (hard to measure precisely)
- Thomas's internal test reference: 500ms per block (he called this overly conservative)
- Tune based on observed cache size and production behaviour
Implementation scope
- Enable
gc-v1 feature on midnight-storage-core in Cargo.toml
- Implement
DB::scan and DB::ScanResumeHandle on SQL-backed LedgerDb (indexer-common/src/infra/ledger_db/v1_1.rs) with paginated SELECT key, object FROM ledger_db_nodes (cursor-based resume)
- Call
persist(latest_ledger_state_key) + unpersist(previous_ledger_state_key) at block advance in chain-indexer
- Periodic
gc(bound) invocation in chain-indexer (between blocks), with time bound configurable via env var
- Metrics:
gc_runs_total, gc_nodes_culled_total, gc_duration_seconds
Out of scope
- Whitelist-based GC (
gc_override_gc_roots) — unpersist/persist pattern is simpler
- Historical ledger state queries (indexer doesn't expose these)
Testing
- Unit test for
scan() on Postgres + SQLite backends
- Integration test: index blocks, verify
ledger_db_nodes shrinks after GC
- Perf test: ensure GC does not cause chain-indexer to fall behind
Context
storage-core 1.2.0 (used by ledger 8.1) introduces the
gc-v1feature, which enables incremental mark-and-sweep garbage collection of Sp nodes. The node (1.0.0-rc.2) already enables bothlayout-v2andgc-v1. The indexer currently only useslayout-v2, soledger_db_nodesgrows unboundedly as historical state becomes unreachable.Thomas Kerber confirmed in Slack that this is not exclusive to ParityDB and the indexer should adopt it: https://shielded.slack.com/archives/C080DP0F58U/p1776335501713349?thread_ts=1776254200.830079&cid=C080DP0F58U (gc-v1 thread, 16 Apr)
Design (guidance from Thomas)
ledger_state_keyas a GC root, unpersist old ones as chain-indexer advances blocks, letgc()clean orphaned Sp nodes.gc()calls in chain-indexer (e.g. every block), rather than a large scheduled sweep.Implementation scope
gc-v1feature onmidnight-storage-coreinCargo.tomlDB::scanandDB::ScanResumeHandleon SQL-backedLedgerDb(indexer-common/src/infra/ledger_db/v1_1.rs) with paginatedSELECT key, object FROM ledger_db_nodes(cursor-based resume)persist(latest_ledger_state_key)+unpersist(previous_ledger_state_key)at block advance in chain-indexergc(bound)invocation in chain-indexer (between blocks), with time bound configurable via env vargc_runs_total,gc_nodes_culled_total,gc_duration_secondsOut of scope
gc_override_gc_roots) — unpersist/persist pattern is simplerTesting
scan()on Postgres + SQLite backendsledger_db_nodesshrinks after GC