Unified Reasoning Configuration

New unified reasoning object for precise control over reasoning models. Specify exact token budgets with max_tokens or use effort levels — all in one consistent API.

January 30, 2026

Unified Reasoning Configuration

We've added a new reasoning configuration object that gives you flexible control over reasoning-capable models. You can now specify reasoning behavior in a consistent, unified way.

Option 1: Reasoning Effort

Use reasoning.effort to control reasoning intensity:

1curl -X POST https://api.llmgateway.io/v1/chat/completions \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "anthropic/claude-sonnet-4-20250514",
6    "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
7    "reasoning": {
8      "effort": "high"
9    }
10  }'

1curl -X POST https://api.llmgateway.io/v1/chat/completions \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "anthropic/claude-sonnet-4-20250514",
6    "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
7    "reasoning": {
8      "effort": "high"
9    }
10  }'

Supported effort levels: none, minimal, low, medium, high, xhigh

Option 2: Exact Token Budget

Use reasoning.max_tokens for precise control over reasoning token allocation:

1curl -X POST https://api.llmgateway.io/v1/chat/completions \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "anthropic/claude-sonnet-4-20250514",
6    "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
7    "reasoning": {
8      "max_tokens": 8000
9    }
10  }'

1curl -X POST https://api.llmgateway.io/v1/chat/completions \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "anthropic/claude-sonnet-4-20250514",
6    "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
7    "reasoning": {
8      "max_tokens": 8000
9    }
10  }'

When max_tokens is specified, it takes precedence over effort.

Supported Models

The reasoning.max_tokens parameter works with:

Anthropic Claude — Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5
Google Gemini — Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Pro Preview

Auto-Routing Support

When using auto-routing (e.g., claude-sonnet-4 without provider prefix) or root models with reasoning.max_tokens, the gateway automatically routes only to providers that support explicit reasoning token budgets.

Provider Constraints

Anthropic: Budget must be between 1,024 and 128,000 tokens (values are automatically clamped)
Google: No specific constraints

Read the docs for more details.