Skip to content

Add semi-automated code generation for RocksDB C API bindings#14572

Open
xingbowang wants to merge 10 commits intofacebook:mainfrom
xingbowang:c_api_gen
Open

Add semi-automated code generation for RocksDB C API bindings#14572
xingbowang wants to merge 10 commits intofacebook:mainfrom
xingbowang:c_api_gen

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

Summary

RocksDB's C API (include/rocksdb/c.h / db/c.cc) is large and hand-maintained, with hundreds of repetitive setter/getter wrappers that follow strict naming and type conventions. Adding new C++ fields requires manually adding matching C wrappers, which is tedious and error-prone.

This PR introduces a semi-automated code generation system for the mechanical parts of the C API while keeping complex wrappers (callbacks, ownership-transfer, multi-get families) hand-written.

Two complementary generators

1. Auto-discovery (auto_simple_bindings.py)

Scans selected C++ public option/metadata structs and auto-generates getter/setter wrappers for simple scalar, enum, string, and chrono fields. Generated .inc files are checked in and compiled into c.h/c.cc via #include. Fields that cannot be auto-generated must be explicitly blocklisted in auto_simple_bindings_blocklist.json with a policy and reason.

2. Spec-driven generator (generate_c_api.py)

Takes spec.json as input and emits consistent boilerplate for method-style wrappers whose C shape cannot be inferred from the C++ header alone (e.g. rocksdb_put, rocksdb_transaction_commit, WriteBatch methods).

Coverage improvement

This PR adds 725 new C API functions, growing the public C API surface from 1,053 to 1,778 exported functions — a 68.9% increase. The bulk of the new coverage (436 functions) comes from auto-discovered option struct setters/getters that were previously missing.

Family New functions
Option structs (auto-discovered) 436
Metadata structs (auto-discovered) 89
ReadOptions (auto-discovered) 46
JobInfo structs (auto-discovered) 46
Spec-driven (subset wrappers) 56
DB simple operations 27
Transaction/TransactionDB/WriteBatch 29

The generated code emits the same style as today's hand-written wrappers (SaveError, Slice(), malloc-owned buffers, unsigned char booleans) and is organized in clearly marked generated sections within c.h/c.cc.

What stays manual

Complex wrapper families remain hand-written:

  • Comparators, merge operators, compaction filters, event listeners
  • MultiGet / batched MultiGet families
  • Openers and constructors involving vectors/maps of descriptors
  • Wrappers involving std::shared_ptr / std::unique_ptr ownership transfer
  • C APIs whose behavior intentionally differs from the C++ API shape

Test Plan

  • Existing db_test.c C API tests pass (1,743 lines of tests extended/verified)
  • python3 tools/c_api_gen/regen_all.py && git diff --exit-code -- include/rocksdb/c_api_gen db/c_api_gen verifies generated output is stable
  • python3 tools/c_api_gen/validate_generated_equivalence.py --ref HEAD verifies equivalence with reference hand-written wrappers

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

⚠️ clang-tidy: 20 warning(s) on changed lines

Completed in 81.9s.

Summary by check

Check Count
cppcoreguidelines-pro-type-member-init 1
cppcoreguidelines-special-member-functions 1
modernize-make-shared 3
readability-braces-around-statements 13
readability-isolate-declaration 2
Total 20

Details

db/c.cc (20 warning(s))
db/c.cc:313:21: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:1066:22: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5300:28: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5303:26: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5311:30: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5314:28: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5327:28: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5330:26: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5338:30: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5341:28: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5427:28: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5430:26: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:5438:47: warning: statement should be inside braces [readability-braces-around-statements]
db/c.cc:6082:29: warning: use std::make_shared instead [modernize-make-shared]
db/c.cc:7354:12: warning: use std::make_shared instead [modernize-make-shared]
db/c.cc:7361:12: warning: use std::make_shared instead [modernize-make-shared]
db/c.cc:8036:8: warning: constructor does not initialize these fields: rep_ [cppcoreguidelines-pro-type-member-init]
db/c.cc:8036:8: warning: class 'SliceTransformWrapper' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
db/c.cc:8438:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
db/c.cc:8451:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]

@xingbowang xingbowang force-pushed the c_api_gen branch 8 times, most recently from ffb1471 to 2232f34 Compare April 4, 2026 23:05
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 673d7a2


Code Review: Add Semi-Automated Code Generation for RocksDB C API Bindings

PR: Add semi-automated code generation for RocksDB C API bindings
Author: xingbowang
Changes: 51 files changed, 41,614 insertions(+), 3,035 deletions(-)
Review Note: The PR diff was truncated; generated .inc files and Python generators were not fully visible.


Executive Summary

This PR introduces a semi-automated code generation system for the RocksDB C API, adding 725 new C API functions (68.9% increase). The architecture is sound and addresses a real maintenance burden. Approve with required changes.


HIGH Severity Findings

H1: Verify All Removed Hand-Written Functions Have Generated Replacements

The PR removes ~35 hand-written implementations (backup_engine_options, block_based_options, restore_options) from db/c.cc while declarations remain in include/rocksdb/c.h. The visible generated .inc files contain only NEW functions, not replacements. If auto-generated replacements are missing, this causes link-time errors for all C consumers. Verify the complete .inc files provide identical-semantics replacements.

H2: CopyStringVector Missing malloc Null Check

If malloc returns null, result[i] = strdup(...) segfaults. Add if (!result) return nullptr; after the malloc call.

H3: C_API_CODEGEN_STAMP Not Defined in Makefile

The Makefile references $(C_API_CODEGEN_STAMP) as a dependency but the variable definition is not visible in the diff. If undefined, Make silently treats it as empty, making the dependency a no-op.

H4: GetMapEntryAt O(n²) Pattern

O(n) per access / O(n²) for full iteration on std::map. Consider providing an iterator-style API for maps with potentially many entries, or document O(n) complexity prominently in headers.

MEDIUM Severity Findings

M1: WalFilter — Use Move Semantics

*new_batch = c_new_batch.rep copies; use std::move(c_new_batch.rep) since the local is about to be destroyed.

M2: Fragile Lambda Capture in exclude_files_callback

Currently safe (callback is synchronous), but capturing options raw pointer is fragile. Prefer capturing cb and state by value, matching the safer pattern used for progress_callback.

M3: Missing alignof in Static Assertions

Add alignof checks alongside existing sizeof/offsetof for complete platform portability.

M4: SyncColumnFamilyMetadataOptionsRange Stale Data Risk

OptSlice references std::string without owning it. If key strings are modified without re-syncing, OptSlice points to stale data.

M5: rocksdb_reset_status Removed — Verify Replacement Exists

LOW: CI downloads shell script from remote URL; rocksdb_wal_file_type returns int vs uint32_t.

Positive Observations

  1. Static assert checks for reinterpret_cast safety — excellent addition
  2. RAII BuildRanges eliminates leak potential
  3. .reserve() calls reduce reallocations
  4. Generated code markers aid maintainability
  5. Blocklist approach documents intentional coverage decisions

Key Recommendations

  1. Verify all removed implementations have auto-generated replacements (H1)
  2. Fix CopyStringVector malloc null check (H2)
  3. Ensure C_API_CODEGEN_STAMP is defined (H3)
  4. Add alignof static_asserts (M3)
  5. Use move semantics in WalFilter (M1)
  6. Capture by value in exclude_files_callback (M2)
  7. Add test coverage for new callback APIs (WalFilter, table_filter, exclude_files)

Full review written to review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 5, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

@xingbowang
Copy link
Copy Markdown
Contributor Author

Regarding the F1 finding (ABI risk on rocksdb_restore_options_set_keep_log_files):

The auto-generator correctly emits unsigned char to match the C++ bool field type. The concern is whether changing from int is a real binary ABI break.

In practice: the value domain is 0/1. On all common calling conventions (x86-64 SysV, Windows x64, ARM64), passing a small integer as int vs unsigned char will correctly transmit the same value for typical usage. The risk would only materialize for dynamically-linked consumers (distro packages like Ubuntu librocksdb.so) that were compiled against the old header and call this specific function — a very narrow audience for a niche restore option.

After weighing the options, we've decided to keep the generated unsigned char signature (the semantically correct one) rather than blocklisting it. Distributions that ship librocksdb.so would need to recompile against the new headers, which is standard practice for a new major C API revision anyway.

— via Navi on behalf of xingbowang

@zaidoon1
Copy link
Copy Markdown
Contributor

zaidoon1 commented Apr 6, 2026

This is great! I've been wanting this for a long time! Thanks for working on this. Just a note:

c.h is no longer self-contained so its breaks bindgen consumers
Prior to this PR, c.h only used angle-bracket includes for standard library headers (<stddef.h>, <stdint.h>, <stdbool.h>). It could be parsed by any tool given just its file path.

This PR adds 11 quoted #include directives:

#include "rocksdb/c_api_gen/c_generated_db_simple_subset.h.inc"
#include "rocksdb/c_api_gen/c_generated_writebatch_subset.h.inc"
// ... 9 more

In rust-rocksdb, bindgen is invoked as:

bindgen::Builder::default()
    .header("rocksdb/include/rocksdb/c.h")
    // No -I flags configured
    .generate()

When libclang encounters #include "rocksdb/c_api_gen/...", it first searches relative to c.h's directory (rocksdb/include/rocksdb/), looking for rocksdb/include/rocksdb/rocksdb/c_api_gen/... which doesn't exist. Then it falls back to the -I include path list but none are configured in the bindgen builder so bindgen will fail to parse c.h

This breaks any downstream consumer that processes c.h directly without setting up include paths (bindgen, FFI generators for other languages, IDEs parsing the header, etc.).

on the rust-rocksdb side, it's an easy fix for me:

let include_dir = rocksdb_include_dir();
bindgen::Builder::default()
    .header(format!("{include_dir}/rocksdb/c.h"))
    .clang_arg(format!("-I{include_dir}"))  // NEW: needed for .h.inc resolution

For me, it's not a big deal, but I want to flag just in case because:

  1. It's a contract change c.h previously had zero dependencies on include paths
  2. Other language binding projects (Python, Java/JNI, Go, Node.js, etc.) may hit the same issue
  3. Maybe we should document this explicitly? For example in the c.h header comment block at the top

@xingbowang
Copy link
Copy Markdown
Contributor Author

Thanks @zaidoon1! Good catch on the self-containedness contract.

We explored a few options and landed on the following approach:

  • Introduced include/rocksdb/c_base.h as the hand-written source (what c.h was before this PR, with #include directives for the generated fragments).
  • c.h is now fully self-contained — all generated fragments are inlined directly, and the only includes are <stdbool.h>, <stddef.h>, and <stdint.h>. No -I flag required.
  • tools/c_api_gen/regen_all.py generates c.h from c_base.h + the .h.inc fragments, and verify_generated_up_to_date.py checks it in CI.

So bindgen invoked as bindgen::Builder::default().header("rocksdb/include/rocksdb/c.h").generate() with no -I flags should work without any changes on the rust-rocksdb side.

We verified this end-to-end against the rust-rocksdb repo.

— via Navi on behalf of xingbowang

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 6, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

3 similar comments
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 6, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 6, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 6, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

meta-codesync bot pushed a commit that referenced this pull request Apr 6, 2026
Summary:
Upgrade the Ubuntu 24 CI Docker image to include clang-21 (from clang-18), and bump all workflow references to the new image tags. Also adds ccache to both images and renames all clang-18 job references to clang-21.

This is mainly for upgrading the clang version later used for generating C API automatically. See PR #14572

### Changes

**`build_tools/ubuntu24_image/Dockerfile`**
- Add clang-21 installation from LLVM snapshot repo (`apt.llvm.org/noble/llvm-toolchain-noble-21`)
- Add ccache
- Add comment pointing Meta employees to internal devvm build guide

**`build_tools/ubuntu22_image/Dockerfile`**
- Add ccache
- Add comment pointing Meta employees to internal devvm build guide

**`.github/workflows/pr-jobs.yml`**
- Rename `build-linux-clang-18-no_test_run` → `build-linux-clang-21-no_test_run`
- Rename `build-linux-clang18-asan-ubsan` → `build-linux-clang21-asan-ubsan`
- Rename `build-linux-clang18-mini-tsan` → `build-linux-clang21-mini-tsan`
- Update all `clang-18`/`clang++-18` references to `clang-21`/`clang++-21`
- Update ccache key prefixes: `clang18-asan-ubsan` → `clang21-asan-ubsan`, `clang18-tsan` → `clang21-tsan`
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

**`.github/workflows/nightly.yml`**
- Rename `build-linux-clang-18-asan-ubsan-with-folly` → `build-linux-clang-21-asan-ubsan-with-folly`
- Update clang-18 → clang-21 compiler references
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

**`.github/workflows/clang-tidy.yml`**
- Update clang-18 → clang-21
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

### New Docker Images

Both images have been built and tested locally. They will be pushed to `ghcr.io/facebook/rocksdb_ubuntu` before this PR is merged.

- `ghcr.io/facebook/rocksdb_ubuntu:22.2` — adds ccache, adds devvm build note
- `ghcr.io/facebook/rocksdb_ubuntu:24.1` — adds clang-21 from LLVM snapshot repo, adds ccache

Pull Request resolved: #14576

Test Plan: Built both images locally and verified CI job names/compiler flags are consistent.

Reviewed By: joshkang97

Differential Revision: D99694293

Pulled By: xingbowang

fbshipit-source-id: c23c27f5cf870fb2c8b4e3d1cba281d0ce63f9d6
Summary:
RocksDB's C API (include/rocksdb/c.h / db/c.cc) is large and
hand-maintained, with hundreds of repetitive setter/getter wrappers that
follow strict naming and type conventions. Adding new C++ fields requires
manually adding matching C wrappers, which is tedious and error-prone.

This PR introduces a semi-automated code generation system for the
mechanical parts of the C API while keeping complex wrappers (callbacks,
ownership-transfer, multi-get families) hand-written.

The system has two complementary generators:

1. Auto-discovery (auto_simple_bindings.py): Scans selected C++
   public option/metadata structs and auto-generates getter/setter wrappers
   for simple scalar, enum, string, and chrono fields. Generated .inc
   files are checked in and compiled into c.h/c.cc via #include.
   Fields that cannot be auto-generated must be explicitly blocklisted in
   auto_simple_bindings_blocklist.json with a policy and reason.

2. Spec-driven generator (generate_c_api.py): Takes spec.json as
   input and emits consistent boilerplate for method-style wrappers whose
   C shape cannot be inferred from the C++ header alone (e.g. rocksdb_put,
   rocksdb_transaction_commit, WriteBatch methods).

Coverage improvement: This PR adds 725 new C API functions, growing the
public C API surface from 1,053 to 1,778 exported functions -- a 68.9%
increase. The bulk of the new coverage (436 functions) comes from
auto-discovered option struct setters/getters that were previously missing.

Coverage breakdown by family:
- Option structs (auto-discovered):     436 new functions
- Metadata structs (auto-discovered):    89 new functions
- ReadOptions (auto-discovered):         46 new functions
- JobInfo structs (auto-discovered):     46 new functions
- Spec-driven (subset wrappers):         56 new functions
- DB simple operations:                  27 new functions
- Transaction/TransactionDB/WriteBatch:  29 new functions

The generated code emits the same style as today's hand-written wrappers
(SaveError, Slice(), malloc-owned buffers, unsigned char booleans) and is
organized in clearly marked generated sections within c.h/c.cc.

Test Plan:
- Existing db_test.c C API tests pass (1743 lines of tests extended/verified)
- python3 tools/c_api_gen/regen_all.py && git diff --exit-code -- include/rocksdb/c_api_gen db/c_api_gen verifies generated output is stable
- python3 tools/c_api_gen/validate_generated_equivalence.py --ref HEAD verifies equivalence with reference hand-written wrappers
doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
…book#14576)

Summary:
Upgrade the Ubuntu 24 CI Docker image to include clang-21 (from clang-18), and bump all workflow references to the new image tags. Also adds ccache to both images and renames all clang-18 job references to clang-21.

This is mainly for upgrading the clang version later used for generating C API automatically. See PR facebook#14572

### Changes

**`build_tools/ubuntu24_image/Dockerfile`**
- Add clang-21 installation from LLVM snapshot repo (`apt.llvm.org/noble/llvm-toolchain-noble-21`)
- Add ccache
- Add comment pointing Meta employees to internal devvm build guide

**`build_tools/ubuntu22_image/Dockerfile`**
- Add ccache
- Add comment pointing Meta employees to internal devvm build guide

**`.github/workflows/pr-jobs.yml`**
- Rename `build-linux-clang-18-no_test_run` → `build-linux-clang-21-no_test_run`
- Rename `build-linux-clang18-asan-ubsan` → `build-linux-clang21-asan-ubsan`
- Rename `build-linux-clang18-mini-tsan` → `build-linux-clang21-mini-tsan`
- Update all `clang-18`/`clang++-18` references to `clang-21`/`clang++-21`
- Update ccache key prefixes: `clang18-asan-ubsan` → `clang21-asan-ubsan`, `clang18-tsan` → `clang21-tsan`
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

**`.github/workflows/nightly.yml`**
- Rename `build-linux-clang-18-asan-ubsan-with-folly` → `build-linux-clang-21-asan-ubsan-with-folly`
- Update clang-18 → clang-21 compiler references
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

**`.github/workflows/clang-tidy.yml`**
- Update clang-18 → clang-21
- Bump `rocksdb_ubuntu:24.0` → `rocksdb_ubuntu:24.1`

### New Docker Images

Both images have been built and tested locally. They will be pushed to `ghcr.io/facebook/rocksdb_ubuntu` before this PR is merged.

- `ghcr.io/facebook/rocksdb_ubuntu:22.2` — adds ccache, adds devvm build note
- `ghcr.io/facebook/rocksdb_ubuntu:24.1` — adds clang-21 from LLVM snapshot repo, adds ccache

Pull Request resolved: facebook#14576

Test Plan: Built both images locally and verified CI job names/compiler flags are consistent.

Reviewed By: joshkang97

Differential Revision: D99694293

Pulled By: xingbowang

fbshipit-source-id: c23c27f5cf870fb2c8b4e3d1cba281d0ce63f9d6
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 9, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D99609629.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants