Building Reliable Model Fallbacks Without Code Sprawl

A practical routing pattern for multi-provider resiliency and graceful degradation when a primary model slows down or fails.

Author

Platform Engineering

Published

8 Feb 2026

Reading time

7 min read

The problem with hardcoded provider logic

Most teams add fallback logic the wrong way: a try/except block around the OpenAI call, a manual retry to Anthropic, and a different request format for each provider. This approach creates provider-specific payload branches scattered across services — a maintenance nightmare.

The reliable pattern is to centralise routing policy in one place and keep your request contract stable everywhere else.

Fallback strategy

Start with a default model per use case, define one or two fallback candidates, and map retry logic only to retryable failures.

Avoid hardcoding provider-specific payload branches across services. Keep your request contract stable and isolate routing policies in one place.

from openai import OpenAI, APIStatusError
 
client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-aicredits-key",
)
 
PRIMARY   = "openai/gpt-4o"
FALLBACK  = "anthropic/claude-3-5-sonnet-20241022"
 
def call_with_fallback(messages: list[dict]) -> str:
    for model in [PRIMARY, FALLBACK]:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return response.choices[0].message.content
        except APIStatusError as e:
            if e.status_code in (429, 500, 502, 503) and model != FALLBACK:
                continue  # try next model
            raise
 
result = call_with_fallback([{"role": "user", "content": "Summarise the key risks of using a single LLM provider."}])
print(result)

Operational checks

Track fallback rate, response latency delta, and cost delta between primary and fallback models. These metrics determine whether your routing policy is healthy.

A rising fallback rate (above 5%) signals that your primary provider is degrading. A cost delta increase means you are spending more per request than expected — the fallback may be a more expensive model tier.

When to add a gateway

At small scale, the two-model try/except pattern above is sufficient. At production scale — more than one service making LLM calls, more than two providers, or a team larger than five — centralise routing in a gateway. AICredits does this transparently: it handles the provider chain, circuit breaking, and retry so your application code stays clean.

Agentic AI Costs: How One Loop Burned ₹5,000 in 10 Minutes (And How to Prevent It)

9 min read

How to Build a Retry Strategy for LLM API Calls

6 min read

Context Window Management: Don't Waste Tokens