Skip to content

Streaming token usage not captured for OpenAI Responses API #1651

@mtakikawa

Description

@mtakikawa

Description

What happened?

When using the OpenAI Responses API with streaming (client.responses.stream()), logfire does not capture token usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) in the span. Non-streaming calls work correctly.

The issue is in OpenaiResponsesStreamState.get_attributes() which doesn't extract the usage field from the response object.

Steps to Reproduce

# Requirements:
#   pip install logfire openai

import os
import logfire
from openai import OpenAI

# Configure logfire first (before setting up our own tracer)
logfire.configure(send_to_logfire=False)

from opentelemetry import trace
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter

# Add our exporter to capture spans
exporter = InMemorySpanExporter()
provider = trace.get_tracer_provider()
if hasattr(provider, 'add_span_processor'):
    provider.add_span_processor(SimpleSpanProcessor(exporter))

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Instrument OpenAI with logfire
logfire.instrument_openai(client)

def test_streaming():
    """Test OpenAI Responses API streaming and check for token usage."""
    exporter.clear()

    # Make a streaming request using Responses API
    with client.responses.stream(
        model="gpt-4o-mini",
        input="Write a haiku about programming."
    ) as stream:
        stream.until_done()

    # Check the captured spans for token usage
    spans = exporter.get_finished_spans()
    for span in spans:
        attrs = dict(span.attributes) if span.attributes else {}
        input_tokens = attrs.get("gen_ai.usage.input_tokens")
        output_tokens = attrs.get("gen_ai.usage.output_tokens")
        if "streaming" in span.name.lower():
            print(f"Span: {span.name}")
            print(f"  gen_ai.usage.input_tokens: {input_tokens}")
            print(f"  gen_ai.usage.output_tokens: {output_tokens}")
            return input_tokens, output_tokens

    return None, None

print("="*60)
print("Testing WITHOUT patch")
print("="*60)

input_tokens, output_tokens = test_streaming()

if input_tokens is None and output_tokens is None:
    print("\nBUG: Token usage not captured for streaming!")
else:
    print(f"\nToken usage captured: input={input_tokens}, output={output_tokens}")

# Now apply the patch
print("\n" + "="*60)
print("Applying patch: adding token usage extraction")
print("="*60)

from logfire._internal.integrations.llm_providers.openai import (
    OpenaiResponsesStreamState,
    responses_output_events,
)

def patched_get_attributes(self, span_data: dict) -> dict:
    response = self.get_response_data()
    span_data["events"] = span_data["events"] + responses_output_events(response)

    # FIX: Extract token usage from the response
    usage = getattr(response, "usage", None)
    input_tokens = getattr(usage, "input_tokens", None)
    output_tokens = getattr(usage, "output_tokens", None)
    if isinstance(input_tokens, int):
        span_data["gen_ai.usage.input_tokens"] = input_tokens
    if isinstance(output_tokens, int):
        span_data["gen_ai.usage.output_tokens"] = output_tokens

    return span_data

OpenaiResponsesStreamState.get_attributes = patched_get_attributes

print("\n" + "="*60)
print("Testing WITH patch")
print("="*60)

input_tokens, output_tokens = test_streaming()

if input_tokens is None and output_tokens is None:
    print("\nToken usage still not captured!")
else:
    print(f"\nFIXED: Token usage captured: input={input_tokens}, output={output_tokens}")

Output:

============================================================
Testing WITHOUT patch
============================================================
15:31:46.405 Responses API with 'gpt-4o-mini' [LLM]
15:31:48.322 streaming response from 'gpt-4o-mini' took 1.08s [LLM]
Span: streaming response from {request_data[model]!r} took {duration:.2f}s
  gen_ai.usage.input_tokens: None
  gen_ai.usage.output_tokens: None

BUG: Token usage not captured for streaming!

============================================================
Applying patch: adding token usage extraction
============================================================

============================================================
Testing WITH patch
============================================================
15:31:48.323 Responses API with 'gpt-4o-mini' [LLM]
15:31:50.137 streaming response from 'gpt-4o-mini' took 1.07s [LLM]
Span: streaming response from {request_data[model]!r} took {duration:.2f}s
  gen_ai.usage.input_tokens: 14
  gen_ai.usage.output_tokens: 21

FIXED: Token usage captured: input=14, output=21

Expected Result

Token usage should be captured in span attributes:

  • gen_ai.usage.input_tokens: ~14 (for the test prompt)
  • gen_ai.usage.output_tokens: ~21-25 (varies by run)

Actual Result (Buggy Behavior)

Token usage attributes are None for streaming responses using the Responses API.

After applying the patch (extracting usage from response object), token counts are correctly captured.

Additional context

The bug is in logfire/_internal/integrations/llm_providers/openai.py:

class OpenaiResponsesStreamState:
    def get_attributes(self, span_data: dict[str, Any]) -> dict[str, Any]:
        response = self.get_response_data()
        span_data['events'] = span_data['events'] + responses_output_events(response)
        return span_data  # Missing: token usage extraction

The fix is to extract usage from the response:

def get_attributes(self, span_data: dict[str, Any]) -> dict[str, Any]:
    response = self.get_response_data()
    span_data['events'] = span_data['events'] + responses_output_events(response)

    # Extract token usage from the response
    usage = getattr(response, "usage", None)
    input_tokens = getattr(usage, "input_tokens", None)
    output_tokens = getattr(usage, "output_tokens", None)
    if isinstance(input_tokens, int):
        span_data["gen_ai.usage.input_tokens"] = input_tokens
    if isinstance(output_tokens, int):
        span_data["gen_ai.usage.output_tokens"] = output_tokens

    return span_data

Note: The non-streaming OpenaiResponsesState class likely has similar code that works correctly - this fix just adds the same token extraction to the streaming variant.

Possibly related issue

While testing this, we noticed the span name shows the literal template string {request_data[model]!r} instead of the interpolated value (e.g., 'gpt-4o-mini'). This appears to be a separate issue in llm_provider.py where the template string 'streaming response from {request_data[model]!r} took {duration:.2f}s' is not being properly formatted. We've seen similar formatting issues in our production logs.

Python, Logfire & OS Versions, related packages (not required)

logfire="4.19.0"
platform="Linux-6.8.0-90-generic-x86_64-with-glibc2.39"
python="3.11.13 (main, Jun  4 2025, 17:37:17) [Clang 20.1.4 ]"
[related_packages]
requests="2.32.5"
pydantic="2.11.1"
fastapi="0.128.0"
openai="2.15.0"
protobuf="5.29.5"
rich="14.2.0"
executing="2.2.1"
opentelemetry-api="1.39.1"
opentelemetry-exporter-otlp-proto-common="1.39.1"
opentelemetry-exporter-otlp-proto-http="1.39.1"
opentelemetry-instrumentation="0.60b1"
opentelemetry-instrumentation-asgi="0.60b1"
opentelemetry-instrumentation-dbapi="0.60b1"
opentelemetry-instrumentation-fastapi="0.60b1"
opentelemetry-instrumentation-google-genai="0.5b0"
opentelemetry-instrumentation-httpx="0.60b1"
opentelemetry-instrumentation-psycopg="0.60b1"
opentelemetry-instrumentation-psycopg2="0.60b1"
opentelemetry-instrumentation-requests="0.60b1"
opentelemetry-instrumentation-sqlalchemy="0.60b1"
opentelemetry-instrumentation-system-metrics="0.60b1"
opentelemetry-proto="1.39.1"
opentelemetry-sdk="1.39.1"
opentelemetry-semantic-conventions="0.60b1"
opentelemetry-util-genai="0.2b0"
opentelemetry-util-http="0.60b1"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions