Reasoning Tokens
Use o1, o3, and DeepSeek R1 reasoning models through AICredits. Understand how thinking tokens are counted and billed.
Use this page with an AI assistant
Opens a new chat with this docs URL and the correct AICredits base URLs.
Reasoning models (OpenAI o1/o3, DeepSeek R1, Claude with extended thinking) generate internal "thinking" tokens before producing their final answer. These hidden reasoning steps improve accuracy on complex tasks — maths, coding, logical deduction — but they add to your token bill.
How Reasoning Tokens Work
When you send a request to a reasoning model:
- The model generates a hidden chain of thought (reasoning tokens)
- The model uses that reasoning to produce the visible answer (output tokens)
- Both reasoning tokens and output tokens are billed as completion tokens
The response you receive contains only the final answer — the intermediate reasoning is not returned.
Billing
Reasoning tokens are billed at the same rate as output (completion) tokens. The usage field in the response breaks them out:
{
"usage": {
"prompt_tokens": 150,
"completion_tokens": 2340,
"total_tokens": 2490,
"completion_tokens_details": {
"reasoning_tokens": 2048,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
}In this example, 2,048 of the 2,340 completion tokens are reasoning tokens. All 2,340 are billed at the output token rate.
Reasoning models can generate thousands of reasoning tokens per request. For a 4,000 reasoning-token response on o1, this can be 10–20× the cost of a comparable GPT-4o response. Monitor your usage closely when using reasoning models.
Using Reasoning Models
from openai import OpenAI
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-key-here",
)
# OpenAI o1 — reasoning model
response = client.chat.completions.create(
model="openai/o1",
messages=[
{
"role": "user",
"content": "Prove that there are infinitely many prime numbers.",
}
],
# o1 uses max_completion_tokens, not max_tokens
max_completion_tokens=8000,
)
print(response.choices[0].message.content)
print(f"Reasoning tokens used: {response.usage.completion_tokens_details.reasoning_tokens}")import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.aicredits.in/v1",
apiKey: "sk-your-key-here",
});
const response = await client.chat.completions.create({
model: "openai/o1",
messages: [
{
role: "user",
content: "Prove that there are infinitely many prime numbers.",
},
],
max_completion_tokens: 8000,
});
console.log(response.choices[0].message.content);
console.log(
"Reasoning tokens:",
response.usage?.completion_tokens_details?.reasoning_tokens,
);curl https://api.aicredits.in/v1/chat/completions \
-H "Authorization: Bearer sk-your-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/o1",
"messages": [
{
"role": "user",
"content": "Prove that there are infinitely many prime numbers."
}
],
"max_completion_tokens": 8000
}'Controlling Reasoning Effort
For o3 models, you can set the reasoning_effort parameter to balance cost vs. accuracy:
| Value | Reasoning tokens | Best for |
|---|---|---|
low | ~1,000 | Simple tasks, cost-sensitive applications |
medium | ~5,000 | Balanced (default) |
high | ~20,000+ | Hard problems where accuracy matters most |
response = client.chat.completions.create(
model="openai/o3",
messages=[{"role": "user", "content": "Solve this differential equation..."}],
extra_body={"reasoning_effort": "high"},
)DeepSeek R1
DeepSeek R1 also uses chain-of-thought reasoning. The reasoning content is returned in the response under reasoning_content:
response = client.chat.completions.create(
model="deepseek/deepseek-reasoner",
messages=[{"role": "user", "content": "What is 47 × 83?"}],
)
# Reasoning content (the thinking process)
reasoning = response.choices[0].message.reasoning_content
# Final answer
answer = response.choices[0].message.contentSupported Reasoning Models
| Model | Provider | Notes |
|---|---|---|
openai/o1 | OpenAI | Use max_completion_tokens (not max_tokens) |
openai/o1-mini | OpenAI | Faster, lower cost than o1 |
openai/o3 | OpenAI | Supports reasoning_effort parameter |
openai/o3-mini | OpenAI | Cost-efficient reasoning |
deepseek/deepseek-reasoner | DeepSeek | Returns reasoning_content field |