fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone by chopratejas · Pull Request #354 · chopratejas/headroom

chopratejas · 2026-05-03T06:05:58Z

Summary

Phase C, PR 2 of 5. Ports /v1/chat/completions to Rust with byte-faithful forwarding, OpenAI live-zone compression dispatch, and streaming via C1's state machines.

What this changes

New: crates/headroom-proxy/src/handlers/chat_completions.rs — POST handler.

n > 1 → passthrough (no compression — multiple completions imply scenarios where determinism per-completion isn't free).
stream: true → byte-passthrough to client; ChunkState (from C1) parses in parallel for telemetry.
tool_choice change → passthrough; tools array never mutated.
All compression decisions emit structured tracing::info! with event=..., request_id, byte counts.

New: crates/headroom-proxy/src/compression/live_zone_openai.rs — OpenAI Chat live-zone dispatcher.

Live zone: latest tool role message's content, latest user role message's text content. Earlier tool/user messages are frozen.
Per-block content-type detection via transforms::detect_content_type (Magika).
Reuses SmartCrusher / LogCompressor / SearchCompressor / DiffCompressor from headroom-core — no new compressors.
Reuses the same per-content-type byte thresholds the Anthropic side uses (THRESHOLD_JSON_ARRAY=1024, BUILD_OUTPUT=512, SEARCH_RESULTS=1024, GIT_DIFF=1024). One source of truth.
Per-block token-validation gate: if a compressed block has more tokens than the original, revert that block (not the whole request — same per-block semantics as Anthropic B4).

Forwarding integration: the handler delegates to forward_http, which now classifies endpoints via CompressibleEndpoint rather than embedding provider logic. SSE/header/request-id plumbing stays single-source.

Differences from Anthropic side (intentional)

No frozen_message_count parameter — OpenAI has no provider-level cache_control scheme. Cache safety enforced purely by "latest tool / latest user only" walker.
n > 1 is a hard skip on OpenAI — there's no equivalent on Anthropic.

Test plan

cargo fmt --check clean
cargo clippy --all-targets --all-features -- -D warnings clean
cargo test --workspace — 975 passed, 0 failed (was 953 after C1; +22 from C2)
7 integration tests in tests/integration_chat_completions.rs:
- passthrough_no_compression_byte_equal
- tool_message_compressed
- n_greater_than_one_passthrough
- stream_options_include_usage_preserved
- tool_choice_change_passthrough_no_mutation
- refusal_field_in_response_handled
- streaming_tool_call_argument_accumulation
13 unit tests for the dispatcher + walker
make ci-precheck PASSED end-to-end

Wire-format observations (RUST_DEV.md follow-up)

OpenAI tool message content is usually a JSON string, but newer multimodal-tool models accept arrays. C2 walks only the string shape; array-shape falls through to passthrough. Document asymmetry vs Anthropic tool_result.content.
user content-array image_url parts are skipped (not mutated, not compressed). C2 only plans replacements for {type: "text", text} parts.

Plan reference

REALIGNMENT/05-phase-C-rust-proxy.md PR-C2 (lines 103-156). C3 (/v1/responses HTTP handler) is next.

Adds a POST handler for /v1/chat/completions and a sibling live-zone dispatcher for the OpenAI Chat Completions request shape. Same compressor backend as Anthropic (SmartCrusher / LogCompressor / SearchCompressor / DiffCompressor), same per-content-type byte thresholds, same tokenizer-validated rejection gate, same byte-range surgery for cache-stable rewrite. Live zone for Chat Completions: the latest role=tool message's content AND the latest role=user message's text content. Earlier tool/user messages are part of the cache hot zone; never touched. tools[] and tool_choice are never read or rewritten — they round-trip byte-equal as a side effect of byte-range surgery. Behaviours: - n > 1 → passthrough (multiple completions imply non-determinism; the proxy gate skips dispatch and forwards original bytes). - stream: true → pass through to forward_http's existing C1 SSE parser tee (ChunkState). - tool_choice change → never mutated. - mode == Off → passthrough with structured 'mode_off' log. - Body not JSON / no messages → passthrough; the dispatcher logs the decision and forwards original bytes. The handler is wired as an explicit POST route on /v1/chat/completions, buffers the body into Bytes, and re-injects it into the shared forward_http function. forward_http's compression gate now classifies the path (AnthropicMessages vs OpenAiChatCompletions) and dispatches to the right module (compress_anthropic_request / compress_openai_chat_request). Single forwarding code path keeps SSE telemetry, header stripping, and request-id plumbing single-source. Files added: - crates/headroom-proxy/src/handlers/chat_completions.rs - crates/headroom-proxy/src/handlers/mod.rs - crates/headroom-proxy/src/compression/live_zone_openai.rs - crates/headroom-proxy/tests/integration_chat_completions.rs Files modified: - crates/headroom-core/src/transforms/live_zone.rs (+compress_openai_chat_live_zone, +helpers) - crates/headroom-core/src/transforms/mod.rs (re-export) - crates/headroom-proxy/src/compression/mod.rs (+CompressibleEndpoint classification, expose live_zone_openai) - crates/headroom-proxy/src/lib.rs (expose handlers module) - crates/headroom-proxy/src/proxy.rs (route + dispatch) Tests: - 7 integration tests in tests/integration_chat_completions.rs covering passthrough byte-equality, tool message compression (≥40% reduction on 1500-row JSON-array fodder), n>1 passthrough, stream_options round-trip, tool_choice non-mutation, refusal delta handling via ChunkState, and tool_call argument accumulation across three streaming chunks. - Unit tests on compress_openai_chat_live_zone (6) and compress_openai_chat_request (7) cover the dispatcher and proxy shim independently. Workspace test count: 953 (after C1) → 975. fmt clean. clippy --all-targets --all-features -D warnings clean. make ci-precheck PASSED. Plugin marketplace versions auto-bumped by the sync-plugin-versions pre-commit hook. Per-PR-C2 plan: REALIGNMENT/05-phase-C-rust-proxy.md.

chopratejas mentioned this pull request May 3, 2026

fix: C3 — /v1/responses Rust HTTP handler + per-item-type passthrough #356

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354

fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354
chopratejas wants to merge 1 commit intomainfrom
realign-C2-rust-chat-completions

chopratejas commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chopratejas commented May 3, 2026

Summary

What this changes

Differences from Anthropic side (intentional)

Test plan

Wire-format observations (RUST_DEV.md follow-up)

Plan reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant