peargent.

Streaming

Stream agent responses in real-time for better user experience

Streaming allows you to display the agent's response token by token as it's being generated, rather than waiting for the entire response to complete. This creates a much more responsive and engaging user experience.

Quick Start

Use the stream() method to get an iterator that yields text chunks as they arrive.

from peargent import create_agent
from peargent.models import openai

agent = create_agent(
    name="StreamingAgent",
    description="An agent that streams responses",
    persona="You are helpful and concise.",
    model=openai("gpt-4o")
)

# Stream response token by token
print("Agent: ", end="", flush=True)

for chunk in agent.stream("What is Python in one sentence?"):
    print(chunk, end="", flush=True)

Output:

Agent: Python is a high-level, interpreted programming language known for its readability and versatility.

Why Use Streaming?

  • Lower Latency: Users see the first words immediately, instead of waiting seconds for the full answer.
  • Better UX: The application feels alive and responsive.
  • Engagement: Users can start reading while the rest of the answer is being generated.

When to Use stream()

Use agent.stream() when you just need the text content of the response.

  • ✅ Chatbots and conversational interfaces
  • ✅ CLI tools requiring real-time feedback
  • ✅ Simple text generation tasks

If you need metadata like token usage, costs, or execution time, use Stream Observe instead.

Streaming with Pools

You can also stream responses from a Pool of agents. The pool will stream the output of whichever agent is currently executing.

from peargent import create_pool

pool = create_pool(
    agents=[researcher, writer],
    router=my_router
)

# Stream the entire multi-agent interaction
for chunk in pool.stream("Research AI and write a summary"):
    print(chunk, end="", flush=True)

Best Practices

  1. Always Flush Output: When printing to a terminal, use flush=True (e.g., print(chunk, end="", flush=True)) to ensure tokens appear immediately.
  2. Handle Empty Chunks: Occasionally, a chunk might be empty. Your UI code should handle this gracefully.

What's Next?

Rich Streaming (Observe) Learn how to get rich metadata like token counts, costs, and duration while streaming.

Async Streaming Run multiple agents concurrently or build high-performance web servers using async streaming.