Reasoning Tokens

Use o1, o3, and DeepSeek R1 reasoning models through AICredits. Understand how thinking tokens are counted and billed.

Reasoning models (OpenAI o1/o3, DeepSeek R1, Claude with extended thinking) generate internal "thinking" tokens before producing their final answer. These hidden reasoning steps improve accuracy on complex tasks — maths, coding, logical deduction — but they add to your token bill.

How Reasoning Tokens Work

When you send a request to a reasoning model:

The model generates a hidden chain of thought (reasoning tokens)
The model uses that reasoning to produce the visible answer (output tokens)
Both reasoning tokens and output tokens are billed as completion tokens

The response you receive contains only the final answer — the intermediate reasoning is not returned.

Billing

Reasoning tokens are billed at the same rate as output (completion) tokens. The usage field in the response breaks them out:

Usage with reasoning tokens

{
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 2340,
    "total_tokens": 2490,
    "completion_tokens_details": {
      "reasoning_tokens": 2048,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

In this example, 2,048 of the 2,340 completion tokens are reasoning tokens. All 2,340 are billed at the output token rate.

Reasoning models can generate thousands of reasoning tokens per request. For a 4,000 reasoning-token response on o1, this can be 10–20× the cost of a comparable GPT-4o response. Monitor your usage closely when using reasoning models.

Using Reasoning Models

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
)

# OpenAI o1 — reasoning model
response = client.chat.completions.create(
    model="openai/o1",
    messages=[
        {
            "role": "user",
            "content": "Prove that there are infinitely many prime numbers.",
        }
    ],
    # o1 uses max_completion_tokens, not max_tokens
    max_completion_tokens=8000,
)

print(response.choices[0].message.content)
print(f"Reasoning tokens used: {response.usage.completion_tokens_details.reasoning_tokens}")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aicredits.in/v1",
  apiKey: "sk-your-key-here",
});

const response = await client.chat.completions.create({
  model: "openai/o1",
  messages: [
    {
      role: "user",
      content: "Prove that there are infinitely many prime numbers.",
    },
  ],
  max_completion_tokens: 8000,
});

console.log(response.choices[0].message.content);
console.log(
  "Reasoning tokens:",
  response.usage?.completion_tokens_details?.reasoning_tokens,
);

curl https://api.aicredits.in/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1",
    "messages": [
      {
        "role": "user",
        "content": "Prove that there are infinitely many prime numbers."
      }
    ],
    "max_completion_tokens": 8000
  }'

Controlling Reasoning Effort

For o3 models, you can set the reasoning_effort parameter to balance cost vs. accuracy:

Value	Reasoning tokens	Best for
`low`	~1,000	Simple tasks, cost-sensitive applications
`medium`	~5,000	Balanced (default)
`high`	~20,000+	Hard problems where accuracy matters most

response = client.chat.completions.create(
    model="openai/o3",
    messages=[{"role": "user", "content": "Solve this differential equation..."}],
    extra_body={"reasoning_effort": "high"},
)

DeepSeek R1

DeepSeek R1 also uses chain-of-thought reasoning. The reasoning content is returned in the response under reasoning_content:

response = client.chat.completions.create(
    model="deepseek/deepseek-reasoner",
    messages=[{"role": "user", "content": "What is 47 × 83?"}],
)

# Reasoning content (the thinking process)
reasoning = response.choices[0].message.reasoning_content
# Final answer
answer = response.choices[0].message.content

Supported Reasoning Models

Model	Provider	Notes
`openai/o1`	OpenAI	Use `max_completion_tokens` (not `max_tokens`)
`openai/o1-mini`	OpenAI	Faster, lower cost than o1
`openai/o3`	OpenAI	Supports `reasoning_effort` parameter
`openai/o3-mini`	OpenAI	Cost-efficient reasoning
`deepseek/deepseek-reasoner`	DeepSeek	Returns `reasoning_content` field