Prompt Caching
Cache large system prompts and repeated context for Claude models. Save up to 90% on repeated context costs with cache reads at 0.1x the standard rate.
Use this page with an AI assistant
Opens a new chat with this docs URL and the correct AICredits base URLs.
Prompt caching lets you cache large, repeated context (system prompts, documents, codebase snippets) so it is only processed once. Subsequent requests that hit the cache are charged at a deeply discounted rate.
How It Works
When you enable caching on a request, AICredits injects the appropriate cache control headers for the provider before forwarding the request. This means you use a single API field ("cache": true) regardless of the provider — no provider-specific SDK changes needed.
First request → Cache WRITE → billed at 1.25× input rate
Second request → Cache HIT → billed at 0.10× input rate (90% discount)You break even after the second request. For applications that reuse the same context many times, the savings compound quickly.
Pricing
| Operation | Rate |
|---|---|
| Cache write | 1.25× standard input token rate |
| Cache read (hit) | 0.10× standard input token rate |
| No cache | 1.0× standard input token rate |
Cache writes cost more
Cache writes are billed at 1.25× the standard rate. Only enable caching when the same context will be reused across at least 2 requests. For one-off requests, caching increases cost.
Enabling Cache
Add "cache": true to your request body:
from openai import OpenAI
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-key-here",
)
LARGE_SYSTEM_PROMPT = """You are an expert software engineer...[long codebase context]..."""
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4.6",
extra_body={"cache": True},
messages=[
{"role": "system", "content": LARGE_SYSTEM_PROMPT},
{"role": "user", "content": "Add a unit test for the billing module."},
],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.aicredits.in/v1",
apiKey: "sk-your-key-here",
});
const response = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4.6",
// @ts-ignore — custom extension field
cache: true,
messages: [
{ role: "system", content: LARGE_SYSTEM_PROMPT },
{ role: "user", content: "Add a unit test for the billing module." },
],
});
console.log(response.choices[0].message.content);curl https://api.aicredits.in/v1/chat/completions \
-H "Authorization: Bearer sk-your-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.6",
"cache": true,
"messages": [
{
"role": "system",
"content": "You are an expert engineer. [large context here]"
},
{
"role": "user",
"content": "Add a unit test for the billing module."
}
]
}'Provider Support
| Provider | Models | Caching Method |
|---|---|---|
| Anthropic | Claude Sonnet, Haiku, Opus | Explicit cache control headers |
| OpenAI | GPT-4o, GPT-4o-mini | Automatic (no opt-in needed) |
| Other providers | — | Not supported |
For OpenAI models, prompt caching is automatic — OpenAI applies it on their end for prompts over a minimum length. You do not need to set "cache": true for OpenAI models. The cache flag only activates explicit caching for Claude models.
Cache TTL & Invalidation
| Aspect | Details |
|---|---|
| TTL | 5 minutes per cache entry |
| Refresh | Each cache hit resets the 5-minute TTL |
| Minimum size | ~1,000 tokens (~4,000 characters) |
| Invalidation | Modifying the cached portion invalidates the cache |
Best Practices
Put the cacheable content at the top of your system prompt. The cache key is derived from the first N tokens of the system message. Keep the static context (instructions, documents, codebase) at the top and the per-request variable content further down.
Reuse the exact same text. Even a single character change invalidates the cache. Template the variable parts rather than concatenating strings inline.
Use for large, repeated context. The breakeven is at the second request. Ideal use cases: full codebase in system prompt, legal documents for review, product catalogs, long conversation history.