API Reference
Chat API
Last updated March 3, 2026
Complete reference for the Cencori Chat Completions API. Create AI chat interactions with built-in security, logging, and multi-provider support.
Overview
The Chat API provides a unified interface to interact with multiple AI providers (OpenAI, Anthropic, Google) through a single endpoint. Every request is automatically secured, logged, and monitored.
- Unified Interface: Same API for all providers (OpenAI, Anthropic, Google)
- Automatic Security: Built-in threat detection and PII filtering
- Complete Logging: Every request and response is logged
- Cost Tracking: Token usage and costs calculated automatically
Basic Usage
Create a chat completion with the Cencori SDK:
import { cencori } from "@/lib/cencori";
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await cencori.chat.completions.create({
model: "gpt-4o",
messages: messages,
});
return Response.json(response);
}Request Parameters
The chat.completions.create() method accepts the following parameters:
model (required)
The AI model to use. Supported models include:
- OpenAI:
gpt-5.2-pro,gpt-5.2,gpt-5.1,gpt-5-pro,gpt-5,gpt-5-mini,gpt-5-nano,gpt-4.1,gpt-4.1-mini,gpt-4.1-nano,gpt-4o,gpt-4o-mini,gpt-4-turbo,o3-pro,o3,o3-mini,o4-mini,o1 - Anthropic:
claude-opus-4.6,claude-opus-4.5,claude-sonnet-4.5,claude-opus-4,claude-sonnet-4,claude-haiku-4.5,claude-3-7-sonnet - Google:
gemini-3-pro,gemini-3-flash,gemini-3-deep-think,gemini-2.5-pro,gemini-2.5-flash,gemini-2.0-flash - xAI:
grok-4.1,grok-4.1-fast,grok-4,grok-4-heavy,grok-3,grok-3-mini - DeepSeek:
deepseek-v3.2,deepseek-v3.2-speciale,deepseek-v3.1,deepseek-chat,deepseek-reasoner - Mistral:
mistral-large-latest,mistral-medium-latest,mistral-small-latest,codestral-latest,magistral-medium - Groq:
llama-4-maverick,llama-4-scout,llama-3.3-70b-versatile,llama-3.1-8b-instant - Cohere:
command-a-03-2025,command-r-plus-08-2024,command-r - Together:
meta-llama/Llama-4-Maverick,meta-llama/Llama-3.3-70B-Instruct-Turbo,Qwen/Qwen2.5-72B-Instruct-Turbo - Perplexity:
sonar-pro,sonar,sonar-reasoning-pro - OpenRouter:
openai/gpt-5,anthropic/claude-opus-4.5,google/gemini-3-pro - Qwen:
qwen2.5-72b-instruct,qwen2.5-32b-instruct,qwen2.5-coder-32b
messages (required)
An array of message objects representing the conversation history.
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" }
]Valid roles: system, user, assistant
temperature (optional)
Controls randomness in responses. Range: 0 to 2. Default: 1.
- Lower values (0.0-0.3): More focused and deterministic
- Higher values (1.5-2.0): More creative and random
maxTokens (optional)
Maximum number of tokens to generate in the response.
stream (optional)
If true, responses will be streamed back as they're generated. Default: false.
user (optional)
A unique identifier for the end-user. Useful for monitoring, rate limiting, and abuse detection.
Response Format
The API returns a structured response object:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 7,
"total_tokens": 20
}
}Response Fields
- id: Unique identifier for this completion
- choices: Array of completion choices
- choices[].message: The generated message with role and content
- choices[].finish_reason: Why generation stopped (
stop,length,content_filter) - usage: Token counts for the request and response
Streaming Responses
Stream responses in real-time for better user experience:
const stream = await cencori.chat.completions.create({
model: "gpt-4o",
messages: messages,
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}Multi-Provider Support
Switch between AI providers by simply changing the model name:
// OpenAI GPT-4o
const openaiResponse = await cencori.chat.completions.create({
model: "gpt-4o",
messages: messages,
});
// Anthropic Claude
const claudeResponse = await cencori.chat.completions.create({
model: "claude-3-opus",
messages: messages,
});[!NOTE] All responses have the same format regardless of provider!
Error Handling
Handle various error scenarios gracefully:
try {
const response = await cencori.chat.completions.create({
model: "gpt-4o",
messages: messages,
});
} catch (error: any) {
if (error.status === 403 && error.code === "SECURITY_VIOLATION") {
// Request blocked by security (PII, injection, etc.)
}
if (error.status === 429) {
// Rate limit exceeded
}
}Best Practices
- Set maxTokens: Prevent unexpectedly long responses and control costs.
- Include user IDs: Enable per-user rate limiting and better analytics.
- Handle errors gracefully: Implement retry logic for transient failures.
- Use streaming for chat UIs: Provide better user experience.
- Cache responses: Reduce costs for repeated queries when appropriate.