AICredits logo
Features

Platform Features

AICredits features — semantic caching, automatic failover, streaming, JSON response healing, and team credit delegation.

Use this page with an AI assistant

Opens a new chat with this docs URL and the correct AICredits base URLs.

Advanced features that make AICredits more than just a proxy — caching, guardrails, automatic failover, and more.

Semantic Caching

Not currently enabled

Semantic caching is not enabled on the hosted platform at this time. The infrastructure (pgvector embeddings, per-user cache) is built and will be rolled out as a configurable feature. Requests pass through to providers with no overhead in the meantime.

When enabled, AICredits caches responses based on the semantic meaning of your queries using pgvector embeddings. If a new query is semantically similar to a cached one (95% similarity threshold), the cached response is returned instantly — saving both cost and latency.

AspectDetails
Technologypgvector (PostgreSQL vector similarity)
Similarity Threshold95% cosine similarity
ScopePer-user (your cache is private)
Default cache TTL24 hours (entries expire automatically)
Cache hit charge0.05× normal model request cost

Cost Savings — when available

Cache hits are billed at a discounted read rate. No provider completion call is made, and the default charge is 0.05× the normal model request cost, lower than provider prompt-cache read pricing. This is particularly effective for applications with repetitive queries.

Prompt Caching

For Claude models, you can cache large, repeated context (system prompts, document chunks, codebase snippets) so it is only processed once. Subsequent requests that hit the cache are charged at a deeply discounted rate.

Opt-in only — caching costs more on the first call

Cache writes are billed at 1.25× the standard input token rate. Cache reads are billed at 0.1× (a 90% discount). You break even after the second request. Only enable caching when you will reuse the same context across multiple calls.

To request caching, add "cache": true to your request body. The proxy injects the appropriate cache headers for the provider — no provider-specific SDK changes needed.

curl https://api.aicredits.in/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.6",
    "cache": true,
    "messages": [
      {
        "role": "system",
        "content": "You are an expert software engineer. [... large codebase context ...]"
      },
      { "role": "user", "content": "Add a unit test for the billing module." }
    ]
  }'
AspectDetails
Supported providersClaude (Anthropic) only — OpenAI caching is automatic
Minimum context size~1,000 tokens (≈ 4,000 characters). Smaller prompts are not cached.
Cache TTL5 minutes (resets each time the cache is hit)
Write cost1.25× standard input rate
Read cost0.1× standard input rate (90% discount)

Best use cases

Large system prompts with your full codebase, long documents, or fixed instructions that stay the same across many calls. Keep the cached portion at the top of your system prompt so the cache key is stable.

Provider Fallback

Requests are routed directly to the upstream provider. On failure, retries happen automatically across healthy keys with exponential backoff.

1. Direct Provider — Calls the provider directly using round-robin across configured API keys. Each key is health-checked — unhealthy keys are skipped automatically.

2. Retry with Backoff — On 429 or 5xx responses, retries across remaining healthy keys with exponential backoff (500ms → 1s → 2s, capped at 5s).

The circuit breaker tracks provider health per API key. Unhealthy keys are skipped for 30 seconds to avoid adding latency from repeated failures.

Streaming

Full streaming support via Server-Sent Events (SSE). Set stream: true in your request. The response format is identical to OpenAI's streaming format, so all compatible SDKs work out of the box.

Streaming works across all providers. AICredits translates provider-specific streaming formats (e.g., Anthropic's Messages API) to the standard OpenAI SSE format.

Response Healing

When you request "response_format": {"type": "json_object"} and the provider returns malformed JSON, AICredits can automatically attempt to repair the response. This includes fixing:

  • Truncated JSON (missing closing brackets/braces)
  • Extra text before/after the JSON object
  • Common formatting issues

Response healing is enabled per deployment. When active, it processes responses transparently — you receive valid JSON even if the provider's raw output was slightly malformed.

Data Retention

Request metadata (model, tokens, cost, timestamp) is retained for 30 days to power your usage history and billing views. A background worker automatically purges records older than 30 days.

Teams & Credit Delegation

Teams let you allocate a portion of your wallet to teammates, students, or collaborators. Each member gets their own credit cap — their API requests are charged from your wallet only when they actually make calls. Unused allocation costs you nothing.

ActionWho
Create a teamWallet owner
Invite member by email + set ₹ capWallet owner
Make API calls (charged to owner)Invited member
View per-member usage + spendWallet owner
Adjust or reclaim allocationWallet owner

A member's spending is capped at their allocated_inr. When they reach the cap, their API key returns a budget-exceeded error — your wallet is never overdrawn beyond what you committed.

Manage your teams at Dashboard → Teams.

On this page