Streaming

Stream LLM responses in real time using Server-Sent Events (SSE). Full OpenAI-compatible streaming across all providers.

Stream responses token-by-token using Server-Sent Events (SSE). AICredits normalises every provider's streaming format into the standard OpenAI SSE protocol, so all compatible SDKs and frameworks work without modification.

Overview

Set stream: true in any chat completions request to enable streaming. The response is a stream of server-sent events, each containing a delta (partial token). The stream ends with a final data: [DONE] event.

Streaming works across all supported providers — OpenAI, Anthropic, Google Gemini, DeepSeek, Mistral, and xAI. Each provider uses a different native streaming format, but AICredits translates all of them to the OpenAI SSE format your client expects.

Basic Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about the Indian monsoon."},
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end="", flush=True)

print()  # Newline after stream completes

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aicredits.in/v1",
  apiKey: "sk-your-key-here",
});

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a poem about the Indian monsoon." },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

console.log(); // Newline after stream completes

# -N disables buffering so chunks are printed as they arrive
curl -N https://api.aicredits.in/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Write a poem about the Indian monsoon."}
    ],
    "stream": true
  }'

Streaming with Tool Calls

Tool calls are fully supported in streaming mode. Tool call deltas arrive as incremental JSON fragments that the OpenAI SDK reassembles for you:

streaming_tools.py

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather in Mumbai?"}],
    tools=tools,
    stream=True,
)

tool_call_chunks = []
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
    if delta.tool_calls:
        tool_call_chunks.extend(delta.tool_calls)

SSE Format

Each streamed event follows the OpenAI SSE format:

Raw SSE Events

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" monsoon"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Billing & Token Counting

Aspect	Behavior
Token counting	Prompt + completion tokens counted from the full response
Cost calculation	Same formula as non-streaming: USD cost → INR via live forex rate
Deduction timing	Balance is deducted after the stream completes (not per-chunk)
Partial responses	If the client disconnects mid-stream, tokens already generated are still billed

Framework Integrations

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const aicredits = createOpenAI({
  baseURL: "https://api.aicredits.in/v1",
  apiKey: process.env.AICREDITS_API_KEY!,
});

// In a Next.js Route Handler (app/api/chat/route.ts):
export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: aicredits("openai/gpt-4o-mini"),
    messages,
  });

  return result.toDataStreamResponse();
}

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="openai/gpt-4o-mini",
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
    streaming=True,
)

for chunk in llm.stream([HumanMessage(content="Tell me about the Taj Mahal")]):
    print(chunk.content, end="", flush=True)

Error Handling

Errors during streaming are delivered as a final SSE event before the stream closes:

stream_error_handling.py

from openai import OpenAI, APIStatusError, APIConnectionError

try:
    stream = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
        stream=True,
    )
    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)

except APIStatusError as e:
    if e.status_code == 402:
        print("\nInsufficient credits — top up your wallet.")
    elif e.status_code == 429:
        print("\nRate limit exceeded — slow down requests.")
    else:
        print(f"\nAPI error {e.status_code}: {e.message}")

except APIConnectionError:
    print("\nConnection error — check your network.")

See the Error Handling guide for the full list of error codes and retry strategies.