Skip to content

Trace preserved-order writes at the recovery boundary#14588

Open
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_08_T262433986
Open

Trace preserved-order writes at the recovery boundary#14588
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_08_T262433986

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

Summary

Preserve-write-order tracing regressed after ec832ff moved trace emission earlier to reduce the WAL/trace gap. That placement is correct for WAL-backed writes because a successful WAL update is the recovery boundary, but it is wrong for disableWAL writes, which are not recoverable until the memtable path succeeds. A disableWAL WriteBatchWithIndex ingest can still fail in SwitchMemtable, leaving the trace with a write that never became visible in the DB.

Fix this by encoding tracing as a stage policy instead of scattering mode checks through each write path. Introduce stage-named helpers for tracing after WAL and after memtable, keep early tracing only for successful WAL-backed writes, delay disableWAL tracing until successful memtable publish, and skip non-memtable writers on the late tracing path. Also move unordered disableWAL tracing into UnorderedWriteMemtable so it actually runs after memtable success, and require the WAL status to be OK before the after-WAL helper emits anything.

Add a deterministic regression test for disableWAL IngestWriteBatchWithIndex that injects a WAL-creation failure in SwitchMemtable, verifies the failed write never becomes visible in the live DB, replays the trace into a fresh DB, and verifies replay also skips the failed write.

Test Plan

  • make db_test2 -j192
  • timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite:DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest'
  • COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest' --gtest_repeat=5
  • COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite' --gtest_repeat=5

Preserve-write-order tracing regressed after ec832ff moved trace emission earlier to reduce the WAL/trace gap. That placement is correct for WAL-backed writes because a successful WAL update is the recovery boundary, but it is wrong for disableWAL writes, which are not recoverable until the memtable path succeeds. A disableWAL WriteBatchWithIndex ingest can still fail in SwitchMemtable, leaving the trace with a write that never became visible in the DB.

Fix this by encoding tracing as a stage policy instead of scattering mode checks through each write path. Introduce stage-named helpers for tracing after WAL and after memtable, keep early tracing only for successful WAL-backed writes, delay disableWAL tracing until successful memtable publish, and skip non-memtable writers on the late tracing path. Also move unordered disableWAL tracing into UnorderedWriteMemtable so it actually runs after memtable success, and require the WAL status to be OK before the after-WAL helper emits anything.

Add a deterministic regression test for disableWAL IngestWriteBatchWithIndex that injects a WAL-creation failure in SwitchMemtable, verifies the failed write never becomes visible in the live DB, replays the trace into a fresh DB, and verifies replay also skips the failed write.

Tested:\n- make db_test2 -j192\n- timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite:DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest'\n- COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest' --gtest_repeat=5\n- COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite' --gtest_repeat=5
@meta-cla meta-cla bot added the CLA Signed label Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

✅ clang-tidy: No findings on changed lines

Completed in 329.0s.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 8, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056251.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit a5d6148


Code Review: Trace Preserved-Order Writes at the Recovery Boundary

Verdict: APPROVE with suggestions

Executive Summary

This PR correctly fixes a regression from ec832ff where disableWAL writes could be traced before memtable insertion succeeded, causing inconsistent trace replay. The two-stage tracing model (AfterWAL / AfterMemtable) is sound and aligns tracing with the correct recovery boundary for each write type.

After multi-agent review (9 agents), debate, and manual adjudication: no critical correctness bugs found.

Findings (7 total: 0 Critical, 0 High, 2 Medium, 5 Low)

ID Severity Finding
F1 LOW Parameter wal_status is misleading — it actually carries WAL+sync+manifest status. Rename to write_status or add clarifying comment.
F2 LOW WritePreparedTxnDB empty batch (seq allocation only) is no longer traced — this is correct, but worth noting as behavioral change.
F3 LOW Add LIKELY/UNLIKELY branch hints in the new helpers for the hot write path.
F4 MEDIUM No positive test verifying successful disableWAL writes ARE traced. Add a test for successful disableWAL WBWI ingest with replay verification.
F5 MEDIUM No test coverage for the new tracing in UnorderedWriteMemtable. Add test with unordered_write=true + disableWAL=true.
F6 LOW Test dedup opportunity — extract common trace→replay→verify pattern into helper.
F7 LOW Dropping "Maybe" prefix from core helper breaks codebase naming convention. Consider keeping it.

Verified Safe (dismissed after debate)

  • Mixed write groups: impossible per EnterAsBatchGroupLeader (write_thread.cc:515)
  • Double tracing in parallel path: CompleteParallelMemTableWriter ensures exactly one thread exits
  • WAL-backed writes double-traced in UnorderedWriteMemtable: AfterMemtable returns early for !disableWAL
  • Pipelined ordering violation: memtable groups are serialized

Full review written to review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant