Models Reference

Complete reference for all AI models available through Cencori, including capabilities, pricing, and context windows.

OpenAI Models

gpt-4o

Context Window

128,000 tokens

Max Output

4,096 tokens

Input Cost

$5.00 / 1M tokens

Output Cost

$15.00 / 1M tokens

OpenAI's flagship model with best-in-class performance across reasoning, coding, and creative tasks. Optimized for speed while maintaining quality.

gpt-4-turbo

Context Window

128,000 tokens

Max Output

4,096 tokens

Input Cost

$10.00 / 1M tokens

Output Cost

$30.00 / 1M tokens

Previous generation GPT-4 with large context window. Good for document analysis and long conversations.

gpt-3.5-turbo

Context Window

16,385 tokens

Max Output

4,096 tokens

Input Cost

$0.50 / 1M tokens

Output Cost

$1.50 / 1M tokens

Fast and cost-effective for simple tasks, chat applications, and high-volume use cases.

Anthropic Models

claude-3-opus

Context Window

200,000 tokens

Max Output

4,096 tokens

Input Cost

$15.00 / 1M tokens

Output Cost

$75.00 / 1M tokens

Anthropic's most capable model with exceptional reasoning and analysis. Best for complex tasks requiring nuanced understanding.

claude-3-sonnet

Context Window

200,000 tokens

Max Output

4,096 tokens

Input Cost

$3.00 / 1M tokens

Output Cost

$15.00 / 1M tokens

Balanced model offering good performance at moderate cost. Ideal for most production use cases.

claude-3-haiku

Context Window

200,000 tokens

Max Output

4,096 tokens

Input Cost

$0.25 / 1M tokens

Output Cost

$1.25 / 1M tokens

Fastest Claude model with competitive pricing. Great for high-volume tasks and real-time applications.

Google Gemini Models

gemini-2.5-flash

Context Window

1,000,000 tokens

Max Output

8,192 tokens

Input Cost

$0.15 / 1M tokens

Output Cost

$0.60 / 1M tokens

Latest Gemini model with massive 1M token context window. Extremely fast and cost-effective for most use cases.

gemini-2.0-flash

Context Window

1,000,000 tokens

Max Output

8,192 tokens

Input Cost

$0.10 / 1M tokens

Output Cost

$0.40 / 1M tokens

Previous generation with 1M token context. Most cost-effective option for large document processing.

Model Selection Guide

Use Case	Recommended Model	Why
Chat applications	gemini-2.5-flash	Fast, cheap, good quality
Code generation	gpt-4o	Best coding capabilities
Document analysis	claude-3-opus	200K context, strong reasoning
Content generation	claude-3-sonnet	Balanced quality and cost
High-volume APIs	gpt-3.5-turbo	Fast, proven, low cost
Complex reasoning	claude-3-opus	Highest quality output

Streaming Support

All models support real-time streaming via Server-Sent Events (SSE). This allows you to display responses as they're generated, providing better user experience for chat interfaces.

Learn more in the Streaming documentation.

Custom Providers

You can add custom AI providers that are compatible with OpenAI or Anthropic APIs. This includes:

Self-hosted models (Llama, Mistral, etc.)
Other cloud providers (Together.ai, Groq, etc.)
Internal company endpoints

Learn more in the Custom Providers guide.