Overview

The Chat API provides a unified interface to interact with multiple AI providers (OpenAI, Anthropic, Google) through a single endpoint. Every request is automatically secured, logged, and monitored.

Unified Interface: Same API for all providers (OpenAI, Anthropic, Google)
Automatic Security: Built-in threat detection and PII filtering
Complete Logging: Every request and response is logged
Cost Tracking: Token usage and costs calculated automatically

Basic Usage

Create a chat completion with the Cencori SDK:

Codetext

import { cencori } from "@/lib/cencori";
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const response = await cencori.chat.completions.create({
    model: "gpt-4o",
    messages: messages,
  });
 
  return Response.json(response);
}

Request Parameters

The chat.completions.create() method accepts the following parameters:

`model` (required)

The AI model to use. Supported models include:

OpenAI: gpt-5.2-pro, gpt-5.2, gpt-5.1, gpt-5-pro, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, gpt-4-turbo, o3-pro, o3, o3-mini, o4-mini, o1
Anthropic: claude-opus-4.6, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4, claude-sonnet-4, claude-haiku-4.5, claude-3-7-sonnet
Google: gemini-3-pro, gemini-3-flash, gemini-3-deep-think, gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
xAI: grok-4.1, grok-4.1-fast, grok-4, grok-4-heavy, grok-3, grok-3-mini
DeepSeek: deepseek-v3.2, deepseek-v3.2-speciale, deepseek-v3.1, deepseek-chat, deepseek-reasoner
Mistral: mistral-large-latest, mistral-medium-latest, mistral-small-latest, codestral-latest, magistral-medium
Groq: llama-4-maverick, llama-4-scout, llama-3.3-70b-versatile, llama-3.1-8b-instant
Cohere: command-a-03-2025, command-r-plus-08-2024, command-r
Together: meta-llama/Llama-4-Maverick, meta-llama/Llama-3.3-70B-Instruct-Turbo, Qwen/Qwen2.5-72B-Instruct-Turbo
Perplexity: sonar-pro, sonar, sonar-reasoning-pro
OpenRouter: openai/gpt-5, anthropic/claude-opus-4.5, google/gemini-3-pro
Qwen: qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-coder-32b

`messages` (required)

An array of message objects representing the conversation history.

Codetext

messages: [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" }
]

Valid roles: system, user, assistant

`temperature` (optional)

Controls randomness in responses. Range: 0 to 2. Default: 1.

Lower values (0.0-0.3): More focused and deterministic
Higher values (1.5-2.0): More creative and random

`maxTokens` (optional)

Maximum number of tokens to generate in the response.

`stream` (optional)

If true, responses will be streamed back as they're generated. Default: false.

`user` (optional)

A unique identifier for the end-user. Useful for monitoring, rate limiting, and abuse detection.

Response Format

The API returns a structured response object:

Codetext

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  }
}

Response Fields

id: Unique identifier for this completion
choices: Array of completion choices
choices[].message: The generated message with role and content
choices[].finish_reason: Why generation stopped (stop, length, content_filter)
usage: Token counts for the request and response

Streaming Responses

Stream responses in real-time for better user experience:

Codetext

const stream = await cencori.chat.completions.create({
  model: "gpt-4o",
  messages: messages,
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Multi-Provider Support

Switch between AI providers by simply changing the model name:

Codetext

// OpenAI GPT-4o
const openaiResponse = await cencori.chat.completions.create({
  model: "gpt-4o",
  messages: messages,
});
 
// Anthropic Claude
const claudeResponse = await cencori.chat.completions.create({
  model: "claude-3-opus",
  messages: messages,
});

[!NOTE] All responses have the same format regardless of provider!

Error Handling

Handle various error scenarios gracefully:

Codetext

try {
  const response = await cencori.chat.completions.create({
    model: "gpt-4o",
    messages: messages,
  });
} catch (error: any) {
  if (error.status === 403 && error.code === "SECURITY_VIOLATION") {
    // Request blocked by security (PII, injection, etc.)
  }
  
  if (error.status === 429) {
    // Rate limit exceeded
  }
}

Best Practices

Set maxTokens: Prevent unexpectedly long responses and control costs.
Include user IDs: Enable per-user rate limiting and better analytics.
Handle errors gracefully: Implement retry logic for transient failures.
Use streaming for chat UIs: Provide better user experience.
Cache responses: Reduce costs for repeated queries when appropriate.

Chat API

Overview

Basic Usage

Request Parameters

model (required)

messages (required)

temperature (optional)

maxTokens (optional)

stream (optional)

user (optional)

Response Format

Response Fields

Streaming Responses

Multi-Provider Support

Error Handling

Best Practices

`model` (required)

`messages` (required)

`temperature` (optional)

`maxTokens` (optional)

`stream` (optional)

`user` (optional)