Trace preserved-order writes at the recovery boundary by xingbowang · Pull Request #14588 · facebook/rocksdb

xingbowang · 2026-04-08T19:35:28Z

Summary

Preserve-write-order tracing regressed after ec832ff moved trace emission earlier to reduce the WAL/trace gap. That placement is correct for WAL-backed writes because a successful WAL update is the recovery boundary, but it is wrong for disableWAL writes, which are not recoverable until the memtable path succeeds. A disableWAL WriteBatchWithIndex ingest can still fail in SwitchMemtable, leaving the trace with a write that never became visible in the DB.

Fix this by encoding tracing as a stage policy instead of scattering mode checks through each write path. Introduce stage-named helpers for tracing after WAL and after memtable, keep early tracing only for successful WAL-backed writes, delay disableWAL tracing until successful memtable publish, and skip non-memtable writers on the late tracing path. Also move unordered disableWAL tracing into UnorderedWriteMemtable so it actually runs after memtable success, and require the WAL status to be OK before the after-WAL helper emits anything.

Add a deterministic regression test for disableWAL IngestWriteBatchWithIndex that injects a WAL-creation failure in SwitchMemtable, verifies the failed write never becomes visible in the live DB, replays the trace into a fresh DB, and verifies replay also skips the failed write.

Test Plan

make db_test2 -j192
timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite:DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest'
COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest' --gtest_repeat=5
COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite' --gtest_repeat=5

Preserve-write-order tracing regressed after ec832ff moved trace emission earlier to reduce the WAL/trace gap. That placement is correct for WAL-backed writes because a successful WAL update is the recovery boundary, but it is wrong for disableWAL writes, which are not recoverable until the memtable path succeeds. A disableWAL WriteBatchWithIndex ingest can still fail in SwitchMemtable, leaving the trace with a write that never became visible in the DB. Fix this by encoding tracing as a stage policy instead of scattering mode checks through each write path. Introduce stage-named helpers for tracing after WAL and after memtable, keep early tracing only for successful WAL-backed writes, delay disableWAL tracing until successful memtable publish, and skip non-memtable writers on the late tracing path. Also move unordered disableWAL tracing into UnorderedWriteMemtable so it actually runs after memtable success, and require the WAL status to be OK before the after-WAL helper emits anything. Add a deterministic regression test for disableWAL IngestWriteBatchWithIndex that injects a WAL-creation failure in SwitchMemtable, verifies the failed write never becomes visible in the live DB, replays the trace into a fresh DB, and verifies replay also skips the failed write. Tested:\n- make db_test2 -j192\n- timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite:DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest'\n- COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedDisableWALWBWIIngest' --gtest_repeat=5\n- COERCE_CONTEXT_SWITCH=1 timeout 60s ./db_test2 --gtest_filter='DBTest2.TracePreserveWriteOrderSkipsFailedWrite' --gtest_repeat=5

github-actions · 2026-04-08T19:42:48Z

✅ clang-tidy: No findings on changed lines

Completed in 329.0s.

meta-codesync · 2026-04-08T20:21:14Z

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056251.

github-actions · 2026-04-08T21:08:32Z

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit a5d6148

Code Review: Trace Preserved-Order Writes at the Recovery Boundary

Verdict: APPROVE with suggestions

Executive Summary

This PR correctly fixes a regression from ec832ff where disableWAL writes could be traced before memtable insertion succeeded, causing inconsistent trace replay. The two-stage tracing model (AfterWAL / AfterMemtable) is sound and aligns tracing with the correct recovery boundary for each write type.

After multi-agent review (9 agents), debate, and manual adjudication: no critical correctness bugs found.

Findings (7 total: 0 Critical, 0 High, 2 Medium, 5 Low)

ID	Severity	Finding
F1	LOW	Parameter `wal_status` is misleading — it actually carries WAL+sync+manifest status. Rename to `write_status` or add clarifying comment.
F2	LOW	WritePreparedTxnDB empty batch (seq allocation only) is no longer traced — this is correct, but worth noting as behavioral change.
F3	LOW	Add `LIKELY`/`UNLIKELY` branch hints in the new helpers for the hot write path.
F4	MEDIUM	No positive test verifying successful disableWAL writes ARE traced. Add a test for successful disableWAL WBWI ingest with replay verification.
F5	MEDIUM	No test coverage for the new tracing in `UnorderedWriteMemtable`. Add test with `unordered_write=true` + `disableWAL=true`.
F6	LOW	Test dedup opportunity — extract common trace→replay→verify pattern into helper.
F7	LOW	Dropping "Maybe" prefix from core helper breaks codebase naming convention. Consider keeping it.

Verified Safe (dismissed after debate)

Mixed write groups: impossible per EnterAsBatchGroupLeader (write_thread.cc:515)
Double tracing in parallel path: CompleteParallelMemTableWriter ensures exactly one thread exits
WAL-backed writes double-traced in UnorderedWriteMemtable: AfterMemtable returns early for !disableWAL
Pipelined ordering violation: memtable groups are serialized

Full review written to review-findings.md.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

Claude may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/claude-review [context] — Request a code review
/claude-query <question> — Ask about the PR or codebase

# Conflicts: # db/db_impl/db_impl_write.cc

meta-cla bot added the CLA Signed label Apr 8, 2026

fix status check

829e0fb

Merge remote-tracking branch 'upstream/main' into 2026_04_08_T262433986

a5d6148

# Conflicts: # db/db_impl/db_impl_write.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace preserved-order writes at the recovery boundary#14588

Trace preserved-order writes at the recovery boundary#14588
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_08_T262433986

xingbowang commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xingbowang commented Apr 8, 2026

Summary

Test Plan

Uh oh!

github-actions bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ clang-tidy: No findings on changed lines

Uh oh!

meta-codesync bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Claude Code Review

Code Review: Trace Preserved-Order Writes at the Recovery Boundary

Executive Summary

Findings (7 total: 0 Critical, 0 High, 2 Medium, 5 Low)

Verified Safe (dismissed after debate)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 8, 2026 •

edited

Loading

github-actions bot commented Apr 8, 2026 •

edited

Loading