Streaming Responses
Deliver real-time, token-by-token AI responses using Server-Sent Events — implementation guides for JavaScript, Node.js, and Python.
Streaming lets your application display the agent's response as it is being generated, token by token. Instead of waiting for the full response (which can take several seconds for long answers), your UI shows text appearing in real time — the same "typing" experience users expect from modern chat interfaces.
Why Streaming Matters
Without streaming, users see nothing until the entire response is generated. For complex answers that take 3-5 seconds, this creates an uncomfortable silence. With streaming:
- Perceived latency drops dramatically. The first token arrives in under 500ms, even if the full response takes several seconds.
- Users can start reading immediately. They begin processing the answer while it is still being generated.
- Your UI feels responsive. The typing indicator and flowing text signals that the system is working.
- You can cancel early. If the response is going in the wrong direction, you can close the stream without waiting for completion.
Enabling Streaming
To enable streaming, set stream: true in your request to the conversations endpoint:
curl -X POST https://api.chatsby.co/v1/conversations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "agent_1a2b3c4d5e",
"message": "Explain your pricing plans in detail.",
"stream": true
}'When streaming is enabled, the response uses Content-Type: text/event-stream instead of application/json. The connection stays open and delivers data as Server-Sent Events (SSE).
SSE Format
The stream delivers three types of events:
token
Contains a piece of the agent's response. Concatenate all token contents to build the full reply.
event: token
data: {"content": "Our pricing"}
event: token
data: {"content": " plans include"}
event: token
data: {"content": " three tiers:"}
done
Signals the end of the response. Contains the conversation ID, sources used, and other metadata.
event: done
data: {"conversation_id": "conv_xyz789abcd", "sources_used": [{"id": "src_abc123defg", "title": "Pricing Page"}]}
error
Signals that an error occurred during generation. The stream closes after this event.
event: error
data: {"type": "server_error", "message": "An internal error occurred during response generation.", "code": "generation_error"}
Each event follows the SSE specification: the event: line specifies the type, and the data: line contains a JSON object. Events are separated by a blank line.
Implementation Examples
JavaScript — fetch with ReadableStream
This is the most versatile approach for browser and Node.js environments:
async function streamMessage(agentId, message) {
const response = await fetch('https://api.chatsby.co/v1/conversations', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CHATSBY_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: agentId,
message: message,
stream: true,
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.error.message);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop(); // Keep incomplete line in buffer
for (const line of lines) {
if (line.startsWith('event: ')) {
// Store event type for next data line
continue;
}
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
// Token event — append to response
fullResponse += data.content;
onToken(data.content); // Your callback to update the UI
}
if (data.conversation_id) {
// Done event — stream complete
onComplete({
reply: fullResponse,
conversationId: data.conversation_id,
sourcesUsed: data.sources_used,
});
}
if (data.type && data.code) {
// Error event
onError(new Error(data.message));
}
}
}
}
}
// Usage
streamMessage('agent_1a2b3c4d5e', 'What are your pricing plans?');JavaScript — EventSource (Browser)
For simpler browser integrations, you can use the EventSource API. Note that EventSource only supports GET requests, so you need a proxy endpoint:
// Your backend creates the stream and proxies it
// Frontend connects via EventSource
const source = new EventSource('/api/chat-stream?agent_id=agent_1a2b3c4d5e&message=Hello');
let fullResponse = '';
source.addEventListener('token', (event) => {
const data = JSON.parse(event.data);
fullResponse += data.content;
document.getElementById('response').textContent = fullResponse;
});
source.addEventListener('done', (event) => {
const data = JSON.parse(event.data);
console.log('Conversation ID:', data.conversation_id);
console.log('Sources used:', data.sources_used);
source.close();
});
source.addEventListener('error', (event) => {
console.error('Stream error:', event);
source.close();
});Node.js — Server-side streaming
import { Chatsby } from '@chatsby/sdk';
const chatsby = new Chatsby({ apiKey: process.env.CHATSBY_API_KEY });
async function streamChat() {
const stream = await chatsby.conversations.stream({
agent_id: 'agent_1a2b3c4d5e',
message: 'Explain your refund policy.',
});
let fullResponse = '';
for await (const event of stream) {
if (event.type === 'token') {
process.stdout.write(event.content);
fullResponse += event.content;
}
if (event.type === 'done') {
console.log('\n\nConversation ID:', event.conversation_id);
console.log('Sources used:', event.sources_used);
}
if (event.type === 'error') {
console.error('Stream error:', event.message);
break;
}
}
}
streamChat();Python — Streaming with requests
import requests
import json
def stream_message(agent_id: str, message: str, api_key: str):
response = requests.post(
"https://api.chatsby.co/v1/conversations",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"agent_id": agent_id,
"message": message,
"stream": True,
},
stream=True,
)
response.raise_for_status()
full_response = ""
for line in response.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("data: "):
data = json.loads(line[6:])
if "content" in data:
full_response += data["content"]
print(data["content"], end="", flush=True)
if "conversation_id" in data:
print(f"\n\nConversation ID: {data['conversation_id']}")
print(f"Sources used: {data['sources_used']}")
if "type" in data and "code" in data:
print(f"\nError: {data['message']}")
break
return full_response
# Usage
stream_message("agent_1a2b3c4d5e", "What are your pricing plans?", "YOUR_API_KEY")Handling Stream Events
Your application should handle all three event types:
| Event | Action |
|---|---|
token | Append content to the response buffer. Update the UI to show new text. |
done | Stop listening. Save the conversation_id for follow-up messages. Display source citations from sources_used. |
error | Stop listening. Show an error message to the user. Log the error for debugging. |
Always handle the error event. If the AI model encounters an issue mid-generation (e.g., content filter triggered, context length exceeded), the stream sends an error event and closes. Your UI should handle this gracefully instead of hanging.
Error Handling in Streams
Errors can occur at two stages:
Before the stream starts — The HTTP response returns a non-2xx status code with a standard JSON error body. Handle this the same way you handle errors for non-streaming requests.
const response = await fetch(url, options);
if (!response.ok) {
// Standard error — not a stream
const error = await response.json();
throw new Error(error.error.message);
}
// Stream started successfully — parse SSE eventsDuring the stream — An error event is delivered via SSE. The stream closes after the error event.
Reconnection Logic
Network interruptions can terminate a stream prematurely. If the stream closes without a done event, implement reconnection:
async function streamWithReconnect(agentId, message, conversationId = null) {
const maxRetries = 3;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const result = await streamMessage(agentId, message, conversationId);
if (result.complete) {
return result; // Stream finished successfully
}
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 5000);
console.warn(`Stream interrupted. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}If a stream is interrupted, the partial response is still valid. You do not need to discard it. On reconnection, send the same message with the conversation_id from the interrupted stream to continue the conversation.
Performance Considerations
- Buffer management — In long responses, avoid rebuilding the entire DOM on every token. Append new text to the existing element instead.
- Throttle UI updates — If tokens arrive faster than the browser can render, batch updates every 50-100ms using
requestAnimationFrame. - Connection limits — Browsers limit the number of concurrent HTTP connections per domain (typically 6). Each active stream uses one connection. Close streams promptly when done.
- Memory — For very long conversations, consider truncating the displayed history and keeping only recent messages in the DOM.
On this page
- Why Streaming Matters
- Enabling Streaming
- SSE Format
- token
- done
- error
- Implementation Examples
- JavaScript — fetch with ReadableStream
- JavaScript — EventSource (Browser)
- Node.js — Server-side streaming
- Python — Streaming with requests
- Handling Stream Events
- Error Handling in Streams
- Reconnection Logic
- Performance Considerations