
How to Build a Retry Strategy for LLM API Calls
Rate limit errors, provider timeouts, and transient failures are inevitable. Here is a production-grade retry strategy with exponential backoff, jitter, and fallback routing.
Author
AICredits Team
Published
3 Apr 2026
Reading time
6 min read
What goes wrong and why
LLM API calls fail for three distinct reasons, each requiring a different response:
- 429 Rate limit — you sent too many requests per minute. Wait and retry.
- Timeout — the provider took too long. Retry only if idempotent.
- 500/502/503 Provider error — transient infrastructure issue. Retry with backoff.
The worst mistake is treating all failures the same way. Retrying immediately on a 429 makes your rate limit problem worse.
Exponential backoff with jitter using tenacity
from tenacity import (
retry,
stop_after_attempt,
stop_after_delay,
wait_exponential,
retry_if_exception_type,
)
from openai import OpenAI, RateLimitError, APIConnectionError
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-aicredits-key",
)
RETRYABLE = (RateLimitError, APIConnectionError)
@retry(
retry=retry_if_exception_type(RETRYABLE),
wait=wait_exponential(multiplier=1, min=2, max=30),
stop=stop_after_attempt(4) | stop_after_delay(60),
)
def call_llm(prompt: str, model: str = "openai/gpt-4o-mini") -> str:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.contentThis waits 2s → 4s → 8s → 16s (capped at 30s) between attempts, with random jitter to prevent thundering herd.
Which errors to retry
from openai import APIStatusError
RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}
def should_retry(error: Exception) -> bool:
if isinstance(error, APIConnectionError):
return True # network issue, always retry
if isinstance(error, APIStatusError):
if error.status_code == 429:
# Check Retry-After header if present
retry_after = error.response.headers.get("Retry-After")
if retry_after:
import time; time.sleep(int(retry_after))
return True
return error.status_code in RETRYABLE_STATUS_CODES
return False
# Never retry: 400 (bad request), 401 (auth), 403 (forbidden), 404 (not found)Full retry + fallback pattern
from tenacity import RetryError
PRIMARY = "openai/gpt-4o"
FALLBACK = "anthropic/claude-3-5-haiku-20241022"
@retry(
retry=retry_if_exception_type(RETRYABLE),
wait=wait_exponential(multiplier=1, min=2, max=30),
stop=stop_after_attempt(3),
)
def call_primary(prompt: str) -> str:
return call_llm(prompt, model=PRIMARY)
def safe_call(prompt: str) -> str:
try:
return call_primary(prompt)
except (RetryError, Exception):
# Primary exhausted — fall back to a different provider
return call_llm(prompt, model=FALLBACK)
result = safe_call("Summarise the benefits of using an LLM gateway.")
print(result)Circuit breaker via AICredits
AICredits implements a circuit breaker at the gateway level — unhealthy provider keys are automatically skipped for 30 seconds. This means you do not need to implement circuit breaking in your application code. The gateway handles it transparently.
What you do need to handle in your code:
- Retry on transient errors (429, 5xx) with exponential backoff
- Application-level fallback to a different model after retries are exhausted
- Logging: record which model served each request and whether a fallback was triggered
Related Articles
Continue in Docs
Need implementation commands and endpoint details? Go to quickstart or API reference.