Streaming Agent Reasoning with OpenAI Agents SDK and Reasoning Models

The integration of OpenAI's reasoning models (o-series) with the Agents SDK presents intriguing possibilities for developers who want to observe an agent's thinking process in real-time. While there are limitations to accessing the complete "train of thought," there are several methods to stream insights into an agent's reasoning as it works.

Understanding Reasoning Models and Their Tokens

OpenAI's reasoning models (o1, o3, o4 series) utilize a special type of processing called "reasoning tokens" in addition to standard input and output tokens. These reasoning tokens represent the model's internal thinking process as it breaks down problems and considers multiple approaches¹.

A crucial point to understand is that these reasoning tokens are typically invisible to the end user:

"While reasoning tokens are not visible via the API, they still occupy space in the model's context window and are billed as output tokens."¹

This means that while the models are indeed performing deep reasoning, by default this process happens behind the scenes.

Streaming Capabilities in the Agents SDK

The OpenAI Agents SDK provides robust streaming functionality through the Runner.run_streamed() method, which returns a RunResultStreaming object. This allows developers to subscribe to updates as an agent run proceeds².

The SDK supports two primary types of streaming events:

1. Raw Response Events

RawResponsesStreamEvent are events passed directly from the LLM in OpenAI Responses API format. These can be used to stream response messages token-by-token as they're generated²:

async for event in result.stream_events():
    if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
        print(event.data.delta, end="", flush=True)

2. Run Item Events and Agent Events

Higher-level events like RunItemStreamEvent provide updates when an item has been fully generated, enabling progress updates at the level of "message generated" or "tool ran"².

Accessing Agent Reasoning

While the raw reasoning tokens are not directly exposed through the API, OpenAI provides a mechanism to gain insights into the model's reasoning process through "reasoning summaries."

Reasoning Summaries

The reasoning summary feature lets you view a structured overview of the model's thinking process:

"While we don't expose the raw reasoning tokens emitted by the model, you can view a summary of the model's reasoning using the the summary parameter."¹

Different models support different summarizer types:

o4-mini supports the "detailed" summarizer
Computer use model supports the "concise" summarizer

Importantly, this feature works with streaming and is supported across reasoning models including o4-mini, o3, o3-mini and o1¹.

Implementation Considerations

When implementing streaming reasoning with the Agents SDK, there are several factors to consider:

1. Context Window Management

Reasoning tokens consume significant space in the context window. The models may generate anywhere from a few hundred to tens of thousands of reasoning tokens depending on problem complexity¹.

2. Visibility in Tracing

The Agents SDK includes built-in tracing that lets you visualize and debug your agentic flows³⁴. The tracing feature records all events during an agent run, which can provide additional insights into the agent's decision-making process⁵.

3. Privacy Considerations

If working with sensitive data, be aware that both traces and logs can contain this information. The SDK provides environment variables to disable tracing and logging of sensitive data⁶:

export OPENAI_AGENTS_DISABLE_TRACING=1
export OPENAI_AGENTS_DONT_LOG_MODEL_DATA=1
export OPENAI_AGENTS_DONT_LOG_TOOL_DATA=1

4. Model Access Requirements

Access to reasoning models depends on your usage tier with OpenAI. While o1 and o3-mini are available to all API users on tiers 1-5, access to o3 is limited to tiers 4 and 5 with some exceptions, and o4-mini requires organization verification⁷.

Conclusion

While it's not possible to stream the complete, raw reasoning tokens that constitute an agent's full train of thought, the OpenAI Agents SDK does provide mechanisms to gain insights into the reasoning process. Through reasoning summaries, streaming capabilities, and tracing tools, developers can observe meaningful representations of how agents are approaching problems in real-time.

For applications where understanding the agent's reasoning process is critical, the combination of streaming functionality with reasoning summaries offers a practical solution that balances insight with efficiency.

⁂

mkbctrl/cursor_openai_reasoning_tokens.md