Docs/API Reference

API Reference

Chat API

Last updated March 3, 2026

Complete reference for the Cencori Chat Completions API. Create AI chat interactions with built-in security, logging, and multi-provider support.

Overview

The Chat API provides a unified interface to interact with multiple AI providers (OpenAI, Anthropic, Google) through a single endpoint. Every request is automatically secured, logged, and monitored.

  • Unified Interface: Same API for all providers (OpenAI, Anthropic, Google)
  • Automatic Security: Built-in threat detection and PII filtering
  • Complete Logging: Every request and response is logged
  • Cost Tracking: Token usage and costs calculated automatically

Basic Usage

Create a chat completion with the Cencori SDK:

Codetext
import { cencori } from "@/lib/cencori";
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const response = await cencori.chat.completions.create({
    model: "gpt-4o",
    messages: messages,
  });
 
  return Response.json(response);
}

Request Parameters

The chat.completions.create() method accepts the following parameters:

model (required)

The AI model to use. Supported models include:

  • OpenAI: gpt-5.2-pro, gpt-5.2, gpt-5.1, gpt-5-pro, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, gpt-4-turbo, o3-pro, o3, o3-mini, o4-mini, o1
  • Anthropic: claude-opus-4.6, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4, claude-sonnet-4, claude-haiku-4.5, claude-3-7-sonnet
  • Google: gemini-3-pro, gemini-3-flash, gemini-3-deep-think, gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
  • xAI: grok-4.1, grok-4.1-fast, grok-4, grok-4-heavy, grok-3, grok-3-mini
  • DeepSeek: deepseek-v3.2, deepseek-v3.2-speciale, deepseek-v3.1, deepseek-chat, deepseek-reasoner
  • Mistral: mistral-large-latest, mistral-medium-latest, mistral-small-latest, codestral-latest, magistral-medium
  • Groq: llama-4-maverick, llama-4-scout, llama-3.3-70b-versatile, llama-3.1-8b-instant
  • Cohere: command-a-03-2025, command-r-plus-08-2024, command-r
  • Together: meta-llama/Llama-4-Maverick, meta-llama/Llama-3.3-70B-Instruct-Turbo, Qwen/Qwen2.5-72B-Instruct-Turbo
  • Perplexity: sonar-pro, sonar, sonar-reasoning-pro
  • OpenRouter: openai/gpt-5, anthropic/claude-opus-4.5, google/gemini-3-pro
  • Qwen: qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-coder-32b

messages (required)

An array of message objects representing the conversation history.

Codetext
messages: [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" }
]

Valid roles: system, user, assistant

temperature (optional)

Controls randomness in responses. Range: 0 to 2. Default: 1.

  • Lower values (0.0-0.3): More focused and deterministic
  • Higher values (1.5-2.0): More creative and random

maxTokens (optional)

Maximum number of tokens to generate in the response.

stream (optional)

If true, responses will be streamed back as they're generated. Default: false.

user (optional)

A unique identifier for the end-user. Useful for monitoring, rate limiting, and abuse detection.

Response Format

The API returns a structured response object:

Codetext
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  }
}

Response Fields

  • id: Unique identifier for this completion
  • choices: Array of completion choices
  • choices[].message: The generated message with role and content
  • choices[].finish_reason: Why generation stopped (stop, length, content_filter)
  • usage: Token counts for the request and response

Streaming Responses

Stream responses in real-time for better user experience:

Codetext
const stream = await cencori.chat.completions.create({
  model: "gpt-4o",
  messages: messages,
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Multi-Provider Support

Switch between AI providers by simply changing the model name:

Codetext
// OpenAI GPT-4o
const openaiResponse = await cencori.chat.completions.create({
  model: "gpt-4o",
  messages: messages,
});
 
// Anthropic Claude
const claudeResponse = await cencori.chat.completions.create({
  model: "claude-3-opus",
  messages: messages,
});

[!NOTE] All responses have the same format regardless of provider!

Error Handling

Handle various error scenarios gracefully:

Codetext
try {
  const response = await cencori.chat.completions.create({
    model: "gpt-4o",
    messages: messages,
  });
} catch (error: any) {
  if (error.status === 403 && error.code === "SECURITY_VIOLATION") {
    // Request blocked by security (PII, injection, etc.)
  }
  
  if (error.status === 429) {
    // Rate limit exceeded
  }
}

Best Practices

  • Set maxTokens: Prevent unexpectedly long responses and control costs.
  • Include user IDs: Enable per-user rate limiting and better analytics.
  • Handle errors gracefully: Implement retry logic for transient failures.
  • Use streaming for chat UIs: Provide better user experience.
  • Cache responses: Reduce costs for repeated queries when appropriate.