AICredits logo
API

Reasoning Tokens

Use o1, o3, and DeepSeek R1 reasoning models through AICredits. Understand how thinking tokens are counted and billed.

Use this page with an AI assistant

Opens a new chat with this docs URL and the correct AICredits base URLs.

Reasoning models (OpenAI o1/o3, DeepSeek R1, Claude with extended thinking) generate internal "thinking" tokens before producing their final answer. These hidden reasoning steps improve accuracy on complex tasks — maths, coding, logical deduction — but they add to your token bill.

How Reasoning Tokens Work

When you send a request to a reasoning model:

  1. The model generates a hidden chain of thought (reasoning tokens)
  2. The model uses that reasoning to produce the visible answer (output tokens)
  3. Both reasoning tokens and output tokens are billed as completion tokens

The response you receive contains only the final answer — the intermediate reasoning is not returned.

Billing

Reasoning tokens are billed at the same rate as output (completion) tokens. The usage field in the response breaks them out:

Usage with reasoning tokens
{
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 2340,
    "total_tokens": 2490,
    "completion_tokens_details": {
      "reasoning_tokens": 2048,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

In this example, 2,048 of the 2,340 completion tokens are reasoning tokens. All 2,340 are billed at the output token rate.

Reasoning models can generate thousands of reasoning tokens per request. For a 4,000 reasoning-token response on o1, this can be 10–20× the cost of a comparable GPT-4o response. Monitor your usage closely when using reasoning models.

Using Reasoning Models

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
)

# OpenAI o1 — reasoning model
response = client.chat.completions.create(
    model="openai/o1",
    messages=[
        {
            "role": "user",
            "content": "Prove that there are infinitely many prime numbers.",
        }
    ],
    # o1 uses max_completion_tokens, not max_tokens
    max_completion_tokens=8000,
)

print(response.choices[0].message.content)
print(f"Reasoning tokens used: {response.usage.completion_tokens_details.reasoning_tokens}")
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aicredits.in/v1",
  apiKey: "sk-your-key-here",
});

const response = await client.chat.completions.create({
  model: "openai/o1",
  messages: [
    {
      role: "user",
      content: "Prove that there are infinitely many prime numbers.",
    },
  ],
  max_completion_tokens: 8000,
});

console.log(response.choices[0].message.content);
console.log(
  "Reasoning tokens:",
  response.usage?.completion_tokens_details?.reasoning_tokens,
);
curl https://api.aicredits.in/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1",
    "messages": [
      {
        "role": "user",
        "content": "Prove that there are infinitely many prime numbers."
      }
    ],
    "max_completion_tokens": 8000
  }'

Controlling Reasoning Effort

For o3 models, you can set the reasoning_effort parameter to balance cost vs. accuracy:

ValueReasoning tokensBest for
low~1,000Simple tasks, cost-sensitive applications
medium~5,000Balanced (default)
high~20,000+Hard problems where accuracy matters most
response = client.chat.completions.create(
    model="openai/o3",
    messages=[{"role": "user", "content": "Solve this differential equation..."}],
    extra_body={"reasoning_effort": "high"},
)

DeepSeek R1

DeepSeek R1 also uses chain-of-thought reasoning. The reasoning content is returned in the response under reasoning_content:

response = client.chat.completions.create(
    model="deepseek/deepseek-reasoner",
    messages=[{"role": "user", "content": "What is 47 × 83?"}],
)

# Reasoning content (the thinking process)
reasoning = response.choices[0].message.reasoning_content
# Final answer
answer = response.choices[0].message.content

Supported Reasoning Models

ModelProviderNotes
openai/o1OpenAIUse max_completion_tokens (not max_tokens)
openai/o1-miniOpenAIFaster, lower cost than o1
openai/o3OpenAISupports reasoning_effort parameter
openai/o3-miniOpenAICost-efficient reasoning
deepseek/deepseek-reasonerDeepSeekReturns reasoning_content field

On this page