-
Notifications
You must be signed in to change notification settings - Fork 223
Streaming token usage not captured for OpenAI Responses API #1651
Description
Description
What happened?
When using the OpenAI Responses API with streaming (client.responses.stream()), logfire does not capture token usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) in the span. Non-streaming calls work correctly.
The issue is in OpenaiResponsesStreamState.get_attributes() which doesn't extract the usage field from the response object.
Steps to Reproduce
# Requirements:
# pip install logfire openai
import os
import logfire
from openai import OpenAI
# Configure logfire first (before setting up our own tracer)
logfire.configure(send_to_logfire=False)
from opentelemetry import trace
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
# Add our exporter to capture spans
exporter = InMemorySpanExporter()
provider = trace.get_tracer_provider()
if hasattr(provider, 'add_span_processor'):
provider.add_span_processor(SimpleSpanProcessor(exporter))
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Instrument OpenAI with logfire
logfire.instrument_openai(client)
def test_streaming():
"""Test OpenAI Responses API streaming and check for token usage."""
exporter.clear()
# Make a streaming request using Responses API
with client.responses.stream(
model="gpt-4o-mini",
input="Write a haiku about programming."
) as stream:
stream.until_done()
# Check the captured spans for token usage
spans = exporter.get_finished_spans()
for span in spans:
attrs = dict(span.attributes) if span.attributes else {}
input_tokens = attrs.get("gen_ai.usage.input_tokens")
output_tokens = attrs.get("gen_ai.usage.output_tokens")
if "streaming" in span.name.lower():
print(f"Span: {span.name}")
print(f" gen_ai.usage.input_tokens: {input_tokens}")
print(f" gen_ai.usage.output_tokens: {output_tokens}")
return input_tokens, output_tokens
return None, None
print("="*60)
print("Testing WITHOUT patch")
print("="*60)
input_tokens, output_tokens = test_streaming()
if input_tokens is None and output_tokens is None:
print("\nBUG: Token usage not captured for streaming!")
else:
print(f"\nToken usage captured: input={input_tokens}, output={output_tokens}")
# Now apply the patch
print("\n" + "="*60)
print("Applying patch: adding token usage extraction")
print("="*60)
from logfire._internal.integrations.llm_providers.openai import (
OpenaiResponsesStreamState,
responses_output_events,
)
def patched_get_attributes(self, span_data: dict) -> dict:
response = self.get_response_data()
span_data["events"] = span_data["events"] + responses_output_events(response)
# FIX: Extract token usage from the response
usage = getattr(response, "usage", None)
input_tokens = getattr(usage, "input_tokens", None)
output_tokens = getattr(usage, "output_tokens", None)
if isinstance(input_tokens, int):
span_data["gen_ai.usage.input_tokens"] = input_tokens
if isinstance(output_tokens, int):
span_data["gen_ai.usage.output_tokens"] = output_tokens
return span_data
OpenaiResponsesStreamState.get_attributes = patched_get_attributes
print("\n" + "="*60)
print("Testing WITH patch")
print("="*60)
input_tokens, output_tokens = test_streaming()
if input_tokens is None and output_tokens is None:
print("\nToken usage still not captured!")
else:
print(f"\nFIXED: Token usage captured: input={input_tokens}, output={output_tokens}")Output:
============================================================
Testing WITHOUT patch
============================================================
15:31:46.405 Responses API with 'gpt-4o-mini' [LLM]
15:31:48.322 streaming response from 'gpt-4o-mini' took 1.08s [LLM]
Span: streaming response from {request_data[model]!r} took {duration:.2f}s
gen_ai.usage.input_tokens: None
gen_ai.usage.output_tokens: None
BUG: Token usage not captured for streaming!
============================================================
Applying patch: adding token usage extraction
============================================================
============================================================
Testing WITH patch
============================================================
15:31:48.323 Responses API with 'gpt-4o-mini' [LLM]
15:31:50.137 streaming response from 'gpt-4o-mini' took 1.07s [LLM]
Span: streaming response from {request_data[model]!r} took {duration:.2f}s
gen_ai.usage.input_tokens: 14
gen_ai.usage.output_tokens: 21
FIXED: Token usage captured: input=14, output=21
Expected Result
Token usage should be captured in span attributes:
gen_ai.usage.input_tokens: ~14 (for the test prompt)gen_ai.usage.output_tokens: ~21-25 (varies by run)
Actual Result (Buggy Behavior)
Token usage attributes are None for streaming responses using the Responses API.
After applying the patch (extracting usage from response object), token counts are correctly captured.
Additional context
The bug is in logfire/_internal/integrations/llm_providers/openai.py:
class OpenaiResponsesStreamState:
def get_attributes(self, span_data: dict[str, Any]) -> dict[str, Any]:
response = self.get_response_data()
span_data['events'] = span_data['events'] + responses_output_events(response)
return span_data # Missing: token usage extractionThe fix is to extract usage from the response:
def get_attributes(self, span_data: dict[str, Any]) -> dict[str, Any]:
response = self.get_response_data()
span_data['events'] = span_data['events'] + responses_output_events(response)
# Extract token usage from the response
usage = getattr(response, "usage", None)
input_tokens = getattr(usage, "input_tokens", None)
output_tokens = getattr(usage, "output_tokens", None)
if isinstance(input_tokens, int):
span_data["gen_ai.usage.input_tokens"] = input_tokens
if isinstance(output_tokens, int):
span_data["gen_ai.usage.output_tokens"] = output_tokens
return span_dataNote: The non-streaming OpenaiResponsesState class likely has similar code that works correctly - this fix just adds the same token extraction to the streaming variant.
Possibly related issue
While testing this, we noticed the span name shows the literal template string {request_data[model]!r} instead of the interpolated value (e.g., 'gpt-4o-mini'). This appears to be a separate issue in llm_provider.py where the template string 'streaming response from {request_data[model]!r} took {duration:.2f}s' is not being properly formatted. We've seen similar formatting issues in our production logs.
Python, Logfire & OS Versions, related packages (not required)
logfire="4.19.0"
platform="Linux-6.8.0-90-generic-x86_64-with-glibc2.39"
python="3.11.13 (main, Jun 4 2025, 17:37:17) [Clang 20.1.4 ]"
[related_packages]
requests="2.32.5"
pydantic="2.11.1"
fastapi="0.128.0"
openai="2.15.0"
protobuf="5.29.5"
rich="14.2.0"
executing="2.2.1"
opentelemetry-api="1.39.1"
opentelemetry-exporter-otlp-proto-common="1.39.1"
opentelemetry-exporter-otlp-proto-http="1.39.1"
opentelemetry-instrumentation="0.60b1"
opentelemetry-instrumentation-asgi="0.60b1"
opentelemetry-instrumentation-dbapi="0.60b1"
opentelemetry-instrumentation-fastapi="0.60b1"
opentelemetry-instrumentation-google-genai="0.5b0"
opentelemetry-instrumentation-httpx="0.60b1"
opentelemetry-instrumentation-psycopg="0.60b1"
opentelemetry-instrumentation-psycopg2="0.60b1"
opentelemetry-instrumentation-requests="0.60b1"
opentelemetry-instrumentation-sqlalchemy="0.60b1"
opentelemetry-instrumentation-system-metrics="0.60b1"
opentelemetry-proto="1.39.1"
opentelemetry-sdk="1.39.1"
opentelemetry-semantic-conventions="0.60b1"
opentelemetry-util-genai="0.2b0"
opentelemetry-util-http="0.60b1"