AICredits logo
Operations

Uptime & Reliability

AICredits SLA, uptime targets, incident history, and status page. Built on multi-key provider health tracking with automatic failover.

Use this page with an AI assistant

Opens a new chat with this docs URL and the correct AICredits base URLs.

AICredits is designed for production workloads. This page covers our uptime commitments, health tracking, and what happens when an upstream provider has an outage.

Service Architecture

AICredits sits in front of multiple LLM providers. When you make a request, the proxy:

  1. Validates your API key (Redis cache, 5ms typical)
  2. Applies rate limiting and guardrails
  3. Routes to the primary provider using a healthy API key
  4. Falls back automatically if the primary is unavailable

This multi-layer architecture means a single provider outage rarely causes an AICredits outage.

Circuit Breaker

Each provider API key has its own health state. When a key returns repeated errors (5xx or 429s), it is marked unhealthy and skipped for 30 seconds. Requests are automatically routed to other healthy keys.

StateTriggerDuration
HealthySuccessful requestDefault
UnhealthyRepeated 5xx / timeout30 seconds
RecoveredFirst successful requestImmediate

SLA Targets

MetricTarget
API availability99.9% monthly uptime
P50 latency (chat completions)< 500ms to first token
P99 latency< 5s to first token
Error budget< 0.1% of requests

Uptime is measured on the AICredits proxy layer. Provider-side latency and errors (counted in your usage logs) are separate from AICredits infrastructure availability.

Health Check Endpoints

Use these endpoints for your own uptime monitoring:

EndpointDescription
GET /healthReturns 200 if the server is running
GET /health/readyReturns 200 if database + Redis connections are healthy
GET /health/liveKubernetes liveness probe endpoint
Check health
curl https://api.aicredits.in/health
# {"status": "ok"}

curl https://api.aicredits.in/health/ready
# {"status": "ready", "db": "ok", "redis": "ok"}

Incident Response

When an upstream provider has a major incident:

  1. The circuit breaker automatically marks affected keys as unhealthy
  2. Traffic shifts to healthy keys or other providers
  3. If all routes for a model are unavailable, requests return 502 Bad Gateway
  4. The proxy retries with exponential backoff (500ms → 1s → 2s, max 3 attempts)

For extended provider outages, check the status page of the affected provider directly.

Retry Guidance

For production workloads, implement client-side retries for 429, 500, 502, and 504 responses:

Resilient client
import time
import random
from openai import OpenAI, APIError

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
    max_retries=0,  # Handle retries manually for more control
)

def call_with_retry(messages, max_attempts=4):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(
                model="openai/gpt-4o-mini",
                messages=messages,
            )
        except APIError as e:
            if e.status_code in (429, 500, 502, 504) and attempt < max_attempts - 1:
                wait = (2 ** attempt) + random.uniform(0, 0.5)
                time.sleep(wait)
            else:
                raise

See the Error Handling guide for the full retry matrix.

On this page