Skip to content

CLI: align PI runtime SDK stream contract#3274

Draft
chubes4 wants to merge 2 commits intounify-ai-runtimes-on-pifrom
pi-runtime-eval-timings
Draft

CLI: align PI runtime SDK stream contract#3274
chubes4 wants to merge 2 commits intounify-ai-runtimes-on-pifrom
pi-runtime-eval-timings

Conversation

@chubes4
Copy link
Copy Markdown
Contributor

@chubes4 chubes4 commented Apr 28, 2026

Summary

  • Preserve Anthropic proxy auth headers when constructing the PI model.
  • Surface assistant messages carried by turn_end, dedupe repeated assistant events, and treat PI stopReason: "error" as an SDK-style failed result.
  • Stream tool-use and tool-result SDK messages from PI tool execution events so recorders/evals observe tool work as it happens instead of only at turn_end.

Stack context

This is a draft stacked PR on top of Riad's PI runtime swap in #3246 (unify-ai-runtimes-on-pi). It is not intended to land independently of that PR.

It also builds on the eval-runner diagnostics that just landed in #3273. Those diagnostics made these runtime contract gaps visible in local SDK-vs-PI benchmarks: without these fixes, PI could look faster while failing to surface assistant output, provider errors, or tool execution in the SDK-shaped stream.

Why

Studio's UI, recorder, and eval tooling still consume an SDK-shaped message stream. This patch keeps the unified PI runtime compatible with that contract while preserving the PI branch's runtime direction.

The main compatibility fixes are:

  • WP.com Anthropic proxy auth needs the bearer token in model headers.
  • turn_end.message may carry assistant output even when no separate message_end event fired.
  • stopReason: "error" should produce an SDK-style failed result, not a successful empty run.
  • Tool-use and tool-result messages should stream at execution start/end so evals and recorders can observe real tool work.

Testing

  • npm test -- apps/cli/ai/tests/pi-runtime.test.ts
  • npm -w wp-studio run typecheck
  • npm run cli:build --silent
  • Local SDK-vs-PI rig benches using the shared homeboy-rigs eval scenarios:
    • studio-agent-runtime
    • studio-agent-site-info

AI assistance

  • AI assistance: Yes
  • Tool(s): OpenCode (GPT-5.5)
  • Used for: drafted the PI runtime contract fixes and regression tests, then ran local validation and benchmark comparisons. Chris directed the benchmarking scope and reviewed the findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant