Uptime & Reliability

AICredits SLA, uptime targets, incident history, and status page. Built on multi-key provider health tracking with automatic failover.

AICredits is designed for production workloads. This page covers our uptime commitments, health tracking, and what happens when an upstream provider has an outage.

Service Architecture

AICredits sits in front of multiple LLM providers. When you make a request, the proxy:

Validates your API key (Redis cache, 5ms typical)
Applies rate limiting and guardrails
Routes to the primary provider using a healthy API key
Falls back automatically if the primary is unavailable

This multi-layer architecture means a single provider outage rarely causes an AICredits outage.

Each provider API key has its own health state. When a key returns repeated errors (5xx or 429s), it is marked unhealthy and skipped for 30 seconds. Requests are automatically routed to other healthy keys.

State	Trigger	Duration
Healthy	Successful request	Default
Unhealthy	Repeated 5xx / timeout	30 seconds
Recovered	First successful request	Immediate

SLA Targets

Metric	Target
API availability	99.9% monthly uptime
P50 latency (chat completions)	< 500ms to first token
P99 latency	< 5s to first token
Error budget	< 0.1% of requests

Uptime is measured on the AICredits proxy layer. Provider-side latency and errors (counted in your usage logs) are separate from AICredits infrastructure availability.

Health Check Endpoints

Use these endpoints for your own uptime monitoring:

Endpoint	Description
`GET /health`	Returns 200 if the server is running
`GET /health/ready`	Returns 200 if database + Redis connections are healthy
`GET /health/live`	Kubernetes liveness probe endpoint

Check health

curl https://api.aicredits.in/health
# {"status": "ok"}

curl https://api.aicredits.in/health/ready
# {"status": "ready", "db": "ok", "redis": "ok"}

Incident Response

When an upstream provider has a major incident:

The circuit breaker automatically marks affected keys as unhealthy
Traffic shifts to healthy keys or other providers
If all routes for a model are unavailable, requests return 502 Bad Gateway
The proxy retries with exponential backoff (500ms → 1s → 2s, max 3 attempts)

For extended provider outages, check the status page of the affected provider directly.

Retry Guidance

For production workloads, implement client-side retries for 429, 500, 502, and 504 responses:

Resilient client

import time
import random
from openai import OpenAI, APIError

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
    max_retries=0,  # Handle retries manually for more control
)

def call_with_retry(messages, max_attempts=4):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(
                model="openai/gpt-4o-mini",
                messages=messages,
            )
        except APIError as e:
            if e.status_code in (429, 500, 502, 504) and attempt < max_attempts - 1:
                wait = (2 ** attempt) + random.uniform(0, 0.5)
                time.sleep(wait)
            else:
                raise

See the Error Handling guide for the full retry matrix.

Service Architecture

Circuit Breaker

SLA Targets

Health Check Endpoints

Incident Response

Retry Guidance

On this page