fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354
Open
chopratejas wants to merge 1 commit intomainfrom
Open
fix: C2 — /v1/chat/completions Rust handler + OpenAI live-zone#354chopratejas wants to merge 1 commit intomainfrom
chopratejas wants to merge 1 commit intomainfrom
Conversation
Adds a POST handler for /v1/chat/completions and a sibling live-zone dispatcher for the OpenAI Chat Completions request shape. Same compressor backend as Anthropic (SmartCrusher / LogCompressor / SearchCompressor / DiffCompressor), same per-content-type byte thresholds, same tokenizer-validated rejection gate, same byte-range surgery for cache-stable rewrite. Live zone for Chat Completions: the latest role=tool message's content AND the latest role=user message's text content. Earlier tool/user messages are part of the cache hot zone; never touched. tools[] and tool_choice are never read or rewritten — they round-trip byte-equal as a side effect of byte-range surgery. Behaviours: - n > 1 → passthrough (multiple completions imply non-determinism; the proxy gate skips dispatch and forwards original bytes). - stream: true → pass through to forward_http's existing C1 SSE parser tee (ChunkState). - tool_choice change → never mutated. - mode == Off → passthrough with structured 'mode_off' log. - Body not JSON / no messages → passthrough; the dispatcher logs the decision and forwards original bytes. The handler is wired as an explicit POST route on /v1/chat/completions, buffers the body into Bytes, and re-injects it into the shared forward_http function. forward_http's compression gate now classifies the path (AnthropicMessages vs OpenAiChatCompletions) and dispatches to the right module (compress_anthropic_request / compress_openai_chat_request). Single forwarding code path keeps SSE telemetry, header stripping, and request-id plumbing single-source. Files added: - crates/headroom-proxy/src/handlers/chat_completions.rs - crates/headroom-proxy/src/handlers/mod.rs - crates/headroom-proxy/src/compression/live_zone_openai.rs - crates/headroom-proxy/tests/integration_chat_completions.rs Files modified: - crates/headroom-core/src/transforms/live_zone.rs (+compress_openai_chat_live_zone, +helpers) - crates/headroom-core/src/transforms/mod.rs (re-export) - crates/headroom-proxy/src/compression/mod.rs (+CompressibleEndpoint classification, expose live_zone_openai) - crates/headroom-proxy/src/lib.rs (expose handlers module) - crates/headroom-proxy/src/proxy.rs (route + dispatch) Tests: - 7 integration tests in tests/integration_chat_completions.rs covering passthrough byte-equality, tool message compression (≥40% reduction on 1500-row JSON-array fodder), n>1 passthrough, stream_options round-trip, tool_choice non-mutation, refusal delta handling via ChunkState, and tool_call argument accumulation across three streaming chunks. - Unit tests on compress_openai_chat_live_zone (6) and compress_openai_chat_request (7) cover the dispatcher and proxy shim independently. Workspace test count: 953 (after C1) → 975. fmt clean. clippy --all-targets --all-features -D warnings clean. make ci-precheck PASSED. Plugin marketplace versions auto-bumped by the sync-plugin-versions pre-commit hook. Per-PR-C2 plan: REALIGNMENT/05-phase-C-rust-proxy.md.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase C, PR 2 of 5. Ports
/v1/chat/completionsto Rust with byte-faithful forwarding, OpenAI live-zone compression dispatch, and streaming via C1's state machines.What this changes
New:
crates/headroom-proxy/src/handlers/chat_completions.rs— POST handler.n > 1→ passthrough (no compression — multiple completions imply scenarios where determinism per-completion isn't free).stream: true→ byte-passthrough to client;ChunkState(from C1) parses in parallel for telemetry.tool_choicechange → passthrough; tools array never mutated.tracing::info!withevent=...,request_id, byte counts.New:
crates/headroom-proxy/src/compression/live_zone_openai.rs— OpenAI Chat live-zone dispatcher.toolrole message'scontent, latestuserrole message's text content. Earliertool/usermessages are frozen.transforms::detect_content_type(Magika).SmartCrusher/LogCompressor/SearchCompressor/DiffCompressorfromheadroom-core— no new compressors.THRESHOLD_JSON_ARRAY=1024,BUILD_OUTPUT=512,SEARCH_RESULTS=1024,GIT_DIFF=1024). One source of truth.Forwarding integration: the handler delegates to
forward_http, which now classifies endpoints viaCompressibleEndpointrather than embedding provider logic. SSE/header/request-id plumbing stays single-source.Differences from Anthropic side (intentional)
frozen_message_countparameter — OpenAI has no provider-levelcache_controlscheme. Cache safety enforced purely by "latest tool / latest user only" walker.n > 1is a hard skip on OpenAI — there's no equivalent on Anthropic.Test plan
cargo fmt --checkcleancargo clippy --all-targets --all-features -- -D warningscleancargo test --workspace— 975 passed, 0 failed (was 953 after C1; +22 from C2)tests/integration_chat_completions.rs:passthrough_no_compression_byte_equaltool_message_compressedn_greater_than_one_passthroughstream_options_include_usage_preservedtool_choice_change_passthrough_no_mutationrefusal_field_in_response_handledstreaming_tool_call_argument_accumulationmake ci-precheckPASSED end-to-endWire-format observations (RUST_DEV.md follow-up)
toolmessagecontentis usually a JSON string, but newer multimodal-tool models accept arrays. C2 walks only the string shape; array-shape falls through to passthrough. Document asymmetry vs Anthropictool_result.content.usercontent-arrayimage_urlparts are skipped (not mutated, not compressed). C2 only plans replacements for{type: "text", text}parts.Plan reference
REALIGNMENT/05-phase-C-rust-proxy.mdPR-C2 (lines 103-156). C3 (/v1/responsesHTTP handler) is next.