Streaming
Stream LLM responses in real time using Server-Sent Events (SSE). Full OpenAI-compatible streaming across all providers.
Use this page with an AI assistant
Opens a new chat with this docs URL and the correct AICredits base URLs.
Stream responses token-by-token using Server-Sent Events (SSE). AICredits normalises every provider's streaming format into the standard OpenAI SSE protocol, so all compatible SDKs and frameworks work without modification.
Overview
Set stream: true in any chat completions request to enable streaming. The response is a stream of server-sent events, each containing a delta (partial token). The stream ends with a final data: [DONE] event.
Streaming works across all supported providers — OpenAI, Anthropic, Google Gemini, DeepSeek, Mistral, and xAI. Each provider uses a different native streaming format, but AICredits translates all of them to the OpenAI SSE format your client expects.
Basic Streaming
from openai import OpenAI
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-key-here",
)
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a poem about the Indian monsoon."},
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content is not None:
print(content, end="", flush=True)
print() # Newline after stream completesimport OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.aicredits.in/v1",
apiKey: "sk-your-key-here",
});
const stream = await client.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a poem about the Indian monsoon." },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log(); // Newline after stream completes# -N disables buffering so chunks are printed as they arrive
curl -N https://api.aicredits.in/v1/chat/completions \
-H "Authorization: Bearer sk-your-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{"role": "user", "content": "Write a poem about the Indian monsoon."}
],
"stream": true
}'Streaming with Tool Calls
Tool calls are fully supported in streaming mode. Tool call deltas arrive as incremental JSON fragments that the OpenAI SDK reassembles for you:
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in Mumbai?"}],
tools=tools,
stream=True,
)
tool_call_chunks = []
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
if delta.tool_calls:
tool_call_chunks.extend(delta.tool_calls)SSE Format
Each streamed event follows the OpenAI SSE format:
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" monsoon"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]Billing & Token Counting
| Aspect | Behavior |
|---|---|
| Token counting | Prompt + completion tokens counted from the full response |
| Cost calculation | Same formula as non-streaming: USD cost → INR via live forex rate |
| Deduction timing | Balance is deducted after the stream completes (not per-chunk) |
| Partial responses | If the client disconnects mid-stream, tokens already generated are still billed |
Framework Integrations
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
const aicredits = createOpenAI({
baseURL: "https://api.aicredits.in/v1",
apiKey: process.env.AICREDITS_API_KEY!,
});
// In a Next.js Route Handler (app/api/chat/route.ts):
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: aicredits("openai/gpt-4o-mini"),
messages,
});
return result.toDataStreamResponse();
}from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(
model="openai/gpt-4o-mini",
base_url="https://api.aicredits.in/v1",
api_key="sk-your-key-here",
streaming=True,
)
for chunk in llm.stream([HumanMessage(content="Tell me about the Taj Mahal")]):
print(chunk.content, end="", flush=True)Error Handling
Errors during streaming are delivered as a final SSE event before the stream closes:
from openai import OpenAI, APIStatusError, APIConnectionError
try:
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
except APIStatusError as e:
if e.status_code == 402:
print("\nInsufficient credits — top up your wallet.")
elif e.status_code == 429:
print("\nRate limit exceeded — slow down requests.")
else:
print(f"\nAPI error {e.status_code}: {e.message}")
except APIConnectionError:
print("\nConnection error — check your network.")See the Error Handling guide for the full list of error codes and retry strategies.