Models Reference
Complete reference for all AI models available through Cencori, including capabilities, pricing, and context windows.
OpenAI Models
gpt-4o
Context Window
128,000 tokens
Max Output
4,096 tokens
Input Cost
$5.00 / 1M tokens
Output Cost
$15.00 / 1M tokens
OpenAI's flagship model with best-in-class performance across reasoning, coding, and creative tasks. Optimized for speed while maintaining quality.
gpt-4-turbo
Context Window
128,000 tokens
Max Output
4,096 tokens
Input Cost
$10.00 / 1M tokens
Output Cost
$30.00 / 1M tokens
Previous generation GPT-4 with large context window. Good for document analysis and long conversations.
gpt-3.5-turbo
Context Window
16,385 tokens
Max Output
4,096 tokens
Input Cost
$0.50 / 1M tokens
Output Cost
$1.50 / 1M tokens
Fast and cost-effective for simple tasks, chat applications, and high-volume use cases.
Anthropic Models
claude-3-opus
Context Window
200,000 tokens
Max Output
4,096 tokens
Input Cost
$15.00 / 1M tokens
Output Cost
$75.00 / 1M tokens
Anthropic's most capable model with exceptional reasoning and analysis. Best for complex tasks requiring nuanced understanding.
claude-3-sonnet
Context Window
200,000 tokens
Max Output
4,096 tokens
Input Cost
$3.00 / 1M tokens
Output Cost
$15.00 / 1M tokens
Balanced model offering good performance at moderate cost. Ideal for most production use cases.
claude-3-haiku
Context Window
200,000 tokens
Max Output
4,096 tokens
Input Cost
$0.25 / 1M tokens
Output Cost
$1.25 / 1M tokens
Fastest Claude model with competitive pricing. Great for high-volume tasks and real-time applications.
Google Gemini Models
gemini-2.5-flash
Context Window
1,000,000 tokens
Max Output
8,192 tokens
Input Cost
$0.15 / 1M tokens
Output Cost
$0.60 / 1M tokens
Latest Gemini model with massive 1M token context window. Extremely fast and cost-effective for most use cases.
gemini-2.0-flash
Context Window
1,000,000 tokens
Max Output
8,192 tokens
Input Cost
$0.10 / 1M tokens
Output Cost
$0.40 / 1M tokens
Previous generation with 1M token context. Most cost-effective option for large document processing.
Model Selection Guide
| Use Case | Recommended Model | Why |
|---|---|---|
| Chat applications | gemini-2.5-flash | Fast, cheap, good quality |
| Code generation | gpt-4o | Best coding capabilities |
| Document analysis | claude-3-opus | 200K context, strong reasoning |
| Content generation | claude-3-sonnet | Balanced quality and cost |
| High-volume APIs | gpt-3.5-turbo | Fast, proven, low cost |
| Complex reasoning | claude-3-opus | Highest quality output |
Streaming Support
All models support real-time streaming via Server-Sent Events (SSE). This allows you to display responses as they're generated, providing better user experience for chat interfaces.
Learn more in the Streaming documentation.
Custom Providers
You can add custom AI providers that are compatible with OpenAI or Anthropic APIs. This includes:
- Self-hosted models (Llama, Mistral, etc.)
- Other cloud providers (Together.ai, Groq, etc.)
- Internal company endpoints
Learn more in the Custom Providers guide.

