Advanced Guides

Streaming Responses

Deliver real-time, token-by-token AI responses using Server-Sent Events — implementation guides for JavaScript, Node.js, and Python.

Streaming lets your application display the agent's response as it is being generated, token by token. Instead of waiting for the full response (which can take several seconds for long answers), your UI shows text appearing in real time — the same "typing" experience users expect from modern chat interfaces.

Why Streaming Matters

Without streaming, users see nothing until the entire response is generated. For complex answers that take 3-5 seconds, this creates an uncomfortable silence. With streaming:

  • Perceived latency drops dramatically. The first token arrives in under 500ms, even if the full response takes several seconds.
  • Users can start reading immediately. They begin processing the answer while it is still being generated.
  • Your UI feels responsive. The typing indicator and flowing text signals that the system is working.
  • You can cancel early. If the response is going in the wrong direction, you can close the stream without waiting for completion.

Enabling Streaming

To enable streaming, set stream: true in your request to the conversations endpoint:

curl -X POST https://api.chatsby.co/v1/conversations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent_1a2b3c4d5e",
    "message": "Explain your pricing plans in detail.",
    "stream": true
  }'

When streaming is enabled, the response uses Content-Type: text/event-stream instead of application/json. The connection stays open and delivers data as Server-Sent Events (SSE).

SSE Format

The stream delivers three types of events:

token

Contains a piece of the agent's response. Concatenate all token contents to build the full reply.

event: token
data: {"content": "Our pricing"}

event: token
data: {"content": " plans include"}

event: token
data: {"content": " three tiers:"}

done

Signals the end of the response. Contains the conversation ID, sources used, and other metadata.

event: done
data: {"conversation_id": "conv_xyz789abcd", "sources_used": [{"id": "src_abc123defg", "title": "Pricing Page"}]}

error

Signals that an error occurred during generation. The stream closes after this event.

event: error
data: {"type": "server_error", "message": "An internal error occurred during response generation.", "code": "generation_error"}

Each event follows the SSE specification: the event: line specifies the type, and the data: line contains a JSON object. Events are separated by a blank line.

Implementation Examples

JavaScript — fetch with ReadableStream

This is the most versatile approach for browser and Node.js environments:

async function streamMessage(agentId, message) {
  const response = await fetch('https://api.chatsby.co/v1/conversations', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CHATSBY_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      agent_id: agentId,
      message: message,
      stream: true,
    }),
  });
 
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error.message);
  }
 
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullResponse = '';
  let buffer = '';
 
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
 
    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // Keep incomplete line in buffer
 
    for (const line of lines) {
      if (line.startsWith('event: ')) {
        // Store event type for next data line
        continue;
      }
 
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
 
        if (data.content) {
          // Token event — append to response
          fullResponse += data.content;
          onToken(data.content); // Your callback to update the UI
        }
 
        if (data.conversation_id) {
          // Done event — stream complete
          onComplete({
            reply: fullResponse,
            conversationId: data.conversation_id,
            sourcesUsed: data.sources_used,
          });
        }
 
        if (data.type && data.code) {
          // Error event
          onError(new Error(data.message));
        }
      }
    }
  }
}
 
// Usage
streamMessage('agent_1a2b3c4d5e', 'What are your pricing plans?');

JavaScript — EventSource (Browser)

For simpler browser integrations, you can use the EventSource API. Note that EventSource only supports GET requests, so you need a proxy endpoint:

// Your backend creates the stream and proxies it
// Frontend connects via EventSource
const source = new EventSource('/api/chat-stream?agent_id=agent_1a2b3c4d5e&message=Hello');
 
let fullResponse = '';
 
source.addEventListener('token', (event) => {
  const data = JSON.parse(event.data);
  fullResponse += data.content;
  document.getElementById('response').textContent = fullResponse;
});
 
source.addEventListener('done', (event) => {
  const data = JSON.parse(event.data);
  console.log('Conversation ID:', data.conversation_id);
  console.log('Sources used:', data.sources_used);
  source.close();
});
 
source.addEventListener('error', (event) => {
  console.error('Stream error:', event);
  source.close();
});

Node.js — Server-side streaming

import { Chatsby } from '@chatsby/sdk';
 
const chatsby = new Chatsby({ apiKey: process.env.CHATSBY_API_KEY });
 
async function streamChat() {
  const stream = await chatsby.conversations.stream({
    agent_id: 'agent_1a2b3c4d5e',
    message: 'Explain your refund policy.',
  });
 
  let fullResponse = '';
 
  for await (const event of stream) {
    if (event.type === 'token') {
      process.stdout.write(event.content);
      fullResponse += event.content;
    }
 
    if (event.type === 'done') {
      console.log('\n\nConversation ID:', event.conversation_id);
      console.log('Sources used:', event.sources_used);
    }
 
    if (event.type === 'error') {
      console.error('Stream error:', event.message);
      break;
    }
  }
}
 
streamChat();

Python — Streaming with requests

import requests
import json
 
def stream_message(agent_id: str, message: str, api_key: str):
    response = requests.post(
        "https://api.chatsby.co/v1/conversations",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        },
        json={
            "agent_id": agent_id,
            "message": message,
            "stream": True,
        },
        stream=True,
    )
 
    response.raise_for_status()
 
    full_response = ""
 
    for line in response.iter_lines(decode_unicode=True):
        if not line:
            continue
 
        if line.startswith("data: "):
            data = json.loads(line[6:])
 
            if "content" in data:
                full_response += data["content"]
                print(data["content"], end="", flush=True)
 
            if "conversation_id" in data:
                print(f"\n\nConversation ID: {data['conversation_id']}")
                print(f"Sources used: {data['sources_used']}")
 
            if "type" in data and "code" in data:
                print(f"\nError: {data['message']}")
                break
 
    return full_response
 
 
# Usage
stream_message("agent_1a2b3c4d5e", "What are your pricing plans?", "YOUR_API_KEY")

Handling Stream Events

Your application should handle all three event types:

EventAction
tokenAppend content to the response buffer. Update the UI to show new text.
doneStop listening. Save the conversation_id for follow-up messages. Display source citations from sources_used.
errorStop listening. Show an error message to the user. Log the error for debugging.

Always handle the error event. If the AI model encounters an issue mid-generation (e.g., content filter triggered, context length exceeded), the stream sends an error event and closes. Your UI should handle this gracefully instead of hanging.

Error Handling in Streams

Errors can occur at two stages:

Before the stream starts — The HTTP response returns a non-2xx status code with a standard JSON error body. Handle this the same way you handle errors for non-streaming requests.

const response = await fetch(url, options);
 
if (!response.ok) {
  // Standard error — not a stream
  const error = await response.json();
  throw new Error(error.error.message);
}
 
// Stream started successfully — parse SSE events

During the stream — An error event is delivered via SSE. The stream closes after the error event.

Reconnection Logic

Network interruptions can terminate a stream prematurely. If the stream closes without a done event, implement reconnection:

async function streamWithReconnect(agentId, message, conversationId = null) {
  const maxRetries = 3;
 
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await streamMessage(agentId, message, conversationId);
 
      if (result.complete) {
        return result; // Stream finished successfully
      }
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
 
      const delay = Math.min(1000 * Math.pow(2, attempt), 5000);
      console.warn(`Stream interrupted. Retrying in ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

If a stream is interrupted, the partial response is still valid. You do not need to discard it. On reconnection, send the same message with the conversation_id from the interrupted stream to continue the conversation.

Performance Considerations

  • Buffer management — In long responses, avoid rebuilding the entire DOM on every token. Append new text to the existing element instead.
  • Throttle UI updates — If tokens arrive faster than the browser can render, batch updates every 50-100ms using requestAnimationFrame.
  • Connection limits — Browsers limit the number of concurrent HTTP connections per domain (typically 6). Each active stream uses one connection. Close streams promptly when done.
  • Memory — For very long conversations, consider truncating the displayed history and keeping only recent messages in the DOM.