Building Reliable Model Fallbacks Without Code Sprawl
Back to blogEngineering

Building Reliable Model Fallbacks Without Code Sprawl

A practical routing pattern for multi-provider resiliency and graceful degradation when a primary model slows down or fails.

Author

Platform Engineering

Published

8 Feb 2026

Reading time

7 min read

The problem with hardcoded provider logic

Most teams add fallback logic the wrong way: a try/except block around the OpenAI call, a manual retry to Anthropic, and a different request format for each provider. This approach creates provider-specific payload branches scattered across services — a maintenance nightmare.

The reliable pattern is to centralise routing policy in one place and keep your request contract stable everywhere else.

Fallback strategy

Start with a default model per use case, define one or two fallback candidates, and map retry logic only to retryable failures.

Avoid hardcoding provider-specific payload branches across services. Keep your request contract stable and isolate routing policies in one place.

from openai import OpenAI, APIStatusError
 
client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-aicredits-key",
)
 
PRIMARY   = "openai/gpt-4o"
FALLBACK  = "anthropic/claude-3-5-sonnet-20241022"
 
def call_with_fallback(messages: list[dict]) -> str:
    for model in [PRIMARY, FALLBACK]:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return response.choices[0].message.content
        except APIStatusError as e:
            if e.status_code in (429, 500, 502, 503) and model != FALLBACK:
                continue  # try next model
            raise
 
result = call_with_fallback([{"role": "user", "content": "Summarise the key risks of using a single LLM provider."}])
print(result)

Operational checks

Track fallback rate, response latency delta, and cost delta between primary and fallback models. These metrics determine whether your routing policy is healthy.

A rising fallback rate (above 5%) signals that your primary provider is degrading. A cost delta increase means you are spending more per request than expected — the fallback may be a more expensive model tier.

When to add a gateway

At small scale, the two-model try/except pattern above is sufficient. At production scale — more than one service making LLM calls, more than two providers, or a team larger than five — centralise routing in a gateway. AICredits does this transparently: it handles the provider chain, circuit breaking, and retry so your application code stays clean.

Related Articles

Continue in Docs

Need implementation commands and endpoint details? Go to quickstart or API reference.