Docs/Platform

Platform

Custom Providers

Last updated April 12, 2026

Route AI traffic through self-hosted models or any OpenAI-compatible endpoint. Full security, billing, and observability — same as built-in providers.

Custom providers let you connect your own model endpoints to Cencori. Self-hosted LLaMA, a fine-tuned model on your GPU server, a private vLLM instance — anything that speaks the OpenAI or Anthropic API format.

Once connected, custom providers go through the same gateway pipeline as OpenAI, Anthropic, and every other built-in provider. Security filtering, rate limiting, end-user billing, observability, caching — all of it works automatically.

How It Works

  1. You register a provider with a name, base URL, and optional API key.
  2. You add one or more model names to the provider.
  3. When a request comes through the gateway with a matching model name, Cencori resolves it to your provider and forwards the request.
  4. The response flows back through the same pipeline — logged, metered, billed.

Supported Formats

FormatProtocolUse when
OpenAI CompatiblePOST /chat/completionsOllama, vLLM, LM Studio, Together, any OpenAI-compatible server
Anthropic CompatiblePOST /messagesSelf-hosted Anthropic-format endpoints

Most self-hosted model servers default to the OpenAI format. Use Anthropic format only if your server specifically implements the Anthropic Messages API.

Model Routing

When the gateway receives a request, it resolves the model field using a three-tier matching system. Custom providers are checked after built-in providers, so you can't shadow gpt-4o or claude-sonnet-4-20250514 with a custom provider.

Tier 1: Exact Model Name

If the requested model matches a model name registered to your custom provider, it routes there.

Codetext
// You registered model "legal-llama-3.1" on provider "Marlo LLaMA"
await client.chat.completions.create({
  model: "legal-llama-3.1",  // exact match → routes to your provider
  messages: [{ role: "user", content: "..." }],
});

Tier 2: Provider Name

If the requested model matches the provider name itself, the gateway routes to that provider and uses the first registered model (or passes the provider name as the upstream model).

Codetext
// Provider name is "Marlo LLaMA"
await client.chat.completions.create({
  model: "Marlo LLaMA",  // matches provider name → routes to your provider
  messages: [{ role: "user", content: "..." }],
});

Tier 3: Provider Prefix

If the requested model starts with providerName/, the gateway routes to that provider and passes the suffix as the upstream model name.

Codetext
await client.chat.completions.create({
  model: "Marlo LLaMA/llama3.1-70b",  // prefix match → routes with "llama3.1-70b" as the model
  messages: [{ role: "user", content: "..." }],
});

This is useful when your server hosts multiple models and you want to select one dynamically without registering each one.

Dashboard Setup

  1. Open your project in the Cencori dashboard.
  2. Go to Custom Providers in the sidebar.
  3. Click Add Provider.
  4. Fill in:
    • Name — a label for this provider (also used for Tier 2/3 routing)
    • Base URL — your model server's endpoint, up to /v1 (not including /chat/completions)
    • API Key — optional. Leave empty for local models without auth. Encrypted at rest with AES-256-GCM if provided.
    • Format — OpenAI Compatible or Anthropic Compatible
  5. Click Create.
  6. Click Test on the provider card to verify the connection. Cencori sends a minimal request and reports success/failure with latency.

API Reference

For the full API reference — create, list, update, delete, test connection, and the models endpoint — see the Custom Providers API docs.

Using Custom Providers

Once a provider is created and active, use the gateway exactly like you would with any built-in provider:

Codetext
import OpenAI from "openai";
 
const client = new OpenAI({
  apiKey: "csk_your_cencori_key",
  baseURL: "https://api.cencori.com/v1",
});
 
// Route to your self-hosted model
const response = await client.chat.completions.create({
  model: "llama3.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
  user: "user_123",  // optional: enables per-user billing
});

Streaming works the same way:

Codetext
const stream = await client.chat.completions.create({
  model: "llama3.1",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

What Custom Providers Get

Everything that built-in providers get:

FeatureHow it works
Security filteringPII detection and prompt injection blocking run on every request before it reaches your model
Rate limitingProject-level and per-key limits enforced at the gateway
End-user billingPass user: "..." to meter, limit, and charge per user
ObservabilityEvery request logged with model, latency, tokens, cost, status, user
Semantic cachingRepeated queries served from cache if enabled
Environment scopingTest-key and production-key traffic tracked separately
Audit loggingProvider create/update/delete actions logged

API Key Encryption

API keys for custom providers are encrypted at rest using AES-256-GCM, scoped to the organization ID. The key is decrypted only at request time when the gateway forwards traffic to your provider. You cannot retrieve the original key from the dashboard — only replace it.

If your model server doesn't require authentication (e.g., Ollama running locally), leave the API key field empty.

Common Server Setup

ServerInstallDefault endpoint
Ollamaollama serve && ollama pull llama3.1http://localhost:11434/v1
vLLMvllm serve meta-llama/Llama-3.1-8B-Instructhttp://localhost:8000/v1
LM StudioDownload from lmstudio.aihttp://localhost:1234/v1

For production, deploy on a cloud VM with a public IP or domain. For testing, use a tunnel like ngrok to expose your local server.

Troubleshooting

Test connection fails with timeout

  • Is the server running and reachable from the internet?
  • Is the base URL correct? It should end at /v1, not include /chat/completions.
  • If using ngrok, is the tunnel still active?

Requests return "model not found"

  • Is the provider active? Check the status badge in the dashboard.
  • Does the model name in your request match a registered model name exactly (case-insensitive)?
  • If using Tier 3 routing (provider/model), does the provider name match?

Requests fail with decryption error

  • This happens if the organization ID changed or the encryption key rotated. Delete the provider and re-create it with the API key.

Model works directly but fails through Cencori

  • Check that the base URL doesn't include a trailing slash.
  • Check that the API format matches what your server implements (OpenAI vs Anthropic).