Skip to content

fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354

Open
chopratejas wants to merge 1 commit intomainfrom
realign-C2-rust-chat-completions
Open

fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354
chopratejas wants to merge 1 commit intomainfrom
realign-C2-rust-chat-completions

Conversation

@chopratejas
Copy link
Copy Markdown
Owner

Summary

Phase C, PR 2 of 5. Ports /v1/chat/completions to Rust with byte-faithful forwarding, OpenAI live-zone compression dispatch, and streaming via C1's state machines.

What this changes

New: crates/headroom-proxy/src/handlers/chat_completions.rs — POST handler.

  • n > 1 → passthrough (no compression — multiple completions imply scenarios where determinism per-completion isn't free).
  • stream: true → byte-passthrough to client; ChunkState (from C1) parses in parallel for telemetry.
  • tool_choice change → passthrough; tools array never mutated.
  • All compression decisions emit structured tracing::info! with event=..., request_id, byte counts.

New: crates/headroom-proxy/src/compression/live_zone_openai.rs — OpenAI Chat live-zone dispatcher.

  • Live zone: latest tool role message's content, latest user role message's text content. Earlier tool/user messages are frozen.
  • Per-block content-type detection via transforms::detect_content_type (Magika).
  • Reuses SmartCrusher / LogCompressor / SearchCompressor / DiffCompressor from headroom-core — no new compressors.
  • Reuses the same per-content-type byte thresholds the Anthropic side uses (THRESHOLD_JSON_ARRAY=1024, BUILD_OUTPUT=512, SEARCH_RESULTS=1024, GIT_DIFF=1024). One source of truth.
  • Per-block token-validation gate: if a compressed block has more tokens than the original, revert that block (not the whole request — same per-block semantics as Anthropic B4).

Forwarding integration: the handler delegates to forward_http, which now classifies endpoints via CompressibleEndpoint rather than embedding provider logic. SSE/header/request-id plumbing stays single-source.

Differences from Anthropic side (intentional)

  • No frozen_message_count parameter — OpenAI has no provider-level cache_control scheme. Cache safety enforced purely by "latest tool / latest user only" walker.
  • n > 1 is a hard skip on OpenAI — there's no equivalent on Anthropic.

Test plan

  • cargo fmt --check clean
  • cargo clippy --all-targets --all-features -- -D warnings clean
  • cargo test --workspace975 passed, 0 failed (was 953 after C1; +22 from C2)
  • 7 integration tests in tests/integration_chat_completions.rs:
    • passthrough_no_compression_byte_equal
    • tool_message_compressed
    • n_greater_than_one_passthrough
    • stream_options_include_usage_preserved
    • tool_choice_change_passthrough_no_mutation
    • refusal_field_in_response_handled
    • streaming_tool_call_argument_accumulation
  • 13 unit tests for the dispatcher + walker
  • make ci-precheck PASSED end-to-end

Wire-format observations (RUST_DEV.md follow-up)

  • OpenAI tool message content is usually a JSON string, but newer multimodal-tool models accept arrays. C2 walks only the string shape; array-shape falls through to passthrough. Document asymmetry vs Anthropic tool_result.content.
  • user content-array image_url parts are skipped (not mutated, not compressed). C2 only plans replacements for {type: "text", text} parts.

Plan reference

REALIGNMENT/05-phase-C-rust-proxy.md PR-C2 (lines 103-156). C3 (/v1/responses HTTP handler) is next.

Adds a POST handler for /v1/chat/completions and a sibling live-zone
dispatcher for the OpenAI Chat Completions request shape. Same
compressor backend as Anthropic (SmartCrusher / LogCompressor /
SearchCompressor / DiffCompressor), same per-content-type byte
thresholds, same tokenizer-validated rejection gate, same byte-range
surgery for cache-stable rewrite.

Live zone for Chat Completions: the latest role=tool message's
content AND the latest role=user message's text content. Earlier
tool/user messages are part of the cache hot zone; never touched.
tools[] and tool_choice are never read or rewritten — they
round-trip byte-equal as a side effect of byte-range surgery.

Behaviours:
- n > 1 → passthrough (multiple completions imply non-determinism;
  the proxy gate skips dispatch and forwards original bytes).
- stream: true → pass through to forward_http's existing C1 SSE
  parser tee (ChunkState).
- tool_choice change → never mutated.
- mode == Off → passthrough with structured 'mode_off' log.
- Body not JSON / no messages → passthrough; the dispatcher logs
  the decision and forwards original bytes.

The handler is wired as an explicit POST route on /v1/chat/completions,
buffers the body into Bytes, and re-injects it into the shared
forward_http function. forward_http's compression gate now classifies
the path (AnthropicMessages vs OpenAiChatCompletions) and dispatches
to the right module (compress_anthropic_request /
compress_openai_chat_request). Single forwarding code path keeps
SSE telemetry, header stripping, and request-id plumbing
single-source.

Files added:
- crates/headroom-proxy/src/handlers/chat_completions.rs
- crates/headroom-proxy/src/handlers/mod.rs
- crates/headroom-proxy/src/compression/live_zone_openai.rs
- crates/headroom-proxy/tests/integration_chat_completions.rs

Files modified:
- crates/headroom-core/src/transforms/live_zone.rs
  (+compress_openai_chat_live_zone, +helpers)
- crates/headroom-core/src/transforms/mod.rs (re-export)
- crates/headroom-proxy/src/compression/mod.rs (+CompressibleEndpoint
  classification, expose live_zone_openai)
- crates/headroom-proxy/src/lib.rs (expose handlers module)
- crates/headroom-proxy/src/proxy.rs (route + dispatch)

Tests:
- 7 integration tests in tests/integration_chat_completions.rs
  covering passthrough byte-equality, tool message compression
  (≥40% reduction on 1500-row JSON-array fodder), n>1 passthrough,
  stream_options round-trip, tool_choice non-mutation, refusal
  delta handling via ChunkState, and tool_call argument
  accumulation across three streaming chunks.
- Unit tests on compress_openai_chat_live_zone (6) and
  compress_openai_chat_request (7) cover the dispatcher and proxy
  shim independently.

Workspace test count: 953 (after C1) → 975. fmt clean. clippy
--all-targets --all-features -D warnings clean. make ci-precheck
PASSED.

Plugin marketplace versions auto-bumped by the sync-plugin-versions
pre-commit hook.

Per-PR-C2 plan: REALIGNMENT/05-phase-C-rust-proxy.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant