Platform
Custom Providers
Last updated April 12, 2026
Route AI traffic through self-hosted models or any OpenAI-compatible endpoint. Full security, billing, and observability — same as built-in providers.
Custom providers let you connect your own model endpoints to Cencori. Self-hosted LLaMA, a fine-tuned model on your GPU server, a private vLLM instance — anything that speaks the OpenAI or Anthropic API format.
Once connected, custom providers go through the same gateway pipeline as OpenAI, Anthropic, and every other built-in provider. Security filtering, rate limiting, end-user billing, observability, caching — all of it works automatically.
How It Works
- You register a provider with a name, base URL, and optional API key.
- You add one or more model names to the provider.
- When a request comes through the gateway with a matching model name, Cencori resolves it to your provider and forwards the request.
- The response flows back through the same pipeline — logged, metered, billed.
Supported Formats
| Format | Protocol | Use when |
|---|---|---|
| OpenAI Compatible | POST /chat/completions | Ollama, vLLM, LM Studio, Together, any OpenAI-compatible server |
| Anthropic Compatible | POST /messages | Self-hosted Anthropic-format endpoints |
Most self-hosted model servers default to the OpenAI format. Use Anthropic format only if your server specifically implements the Anthropic Messages API.
Model Routing
When the gateway receives a request, it resolves the model field using a three-tier matching system. Custom providers are checked after built-in providers, so you can't shadow gpt-4o or claude-sonnet-4-20250514 with a custom provider.
Tier 1: Exact Model Name
If the requested model matches a model name registered to your custom provider, it routes there.
// You registered model "legal-llama-3.1" on provider "Marlo LLaMA"
await client.chat.completions.create({
model: "legal-llama-3.1", // exact match → routes to your provider
messages: [{ role: "user", content: "..." }],
});Tier 2: Provider Name
If the requested model matches the provider name itself, the gateway routes to that provider and uses the first registered model (or passes the provider name as the upstream model).
// Provider name is "Marlo LLaMA"
await client.chat.completions.create({
model: "Marlo LLaMA", // matches provider name → routes to your provider
messages: [{ role: "user", content: "..." }],
});Tier 3: Provider Prefix
If the requested model starts with providerName/, the gateway routes to that provider and passes the suffix as the upstream model name.
await client.chat.completions.create({
model: "Marlo LLaMA/llama3.1-70b", // prefix match → routes with "llama3.1-70b" as the model
messages: [{ role: "user", content: "..." }],
});This is useful when your server hosts multiple models and you want to select one dynamically without registering each one.
Dashboard Setup
- Open your project in the Cencori dashboard.
- Go to Custom Providers in the sidebar.
- Click Add Provider.
- Fill in:
- Name — a label for this provider (also used for Tier 2/3 routing)
- Base URL — your model server's endpoint, up to
/v1(not including/chat/completions) - API Key — optional. Leave empty for local models without auth. Encrypted at rest with AES-256-GCM if provided.
- Format — OpenAI Compatible or Anthropic Compatible
- Click Create.
- Click Test on the provider card to verify the connection. Cencori sends a minimal request and reports success/failure with latency.
API Reference
For the full API reference — create, list, update, delete, test connection, and the models endpoint — see the Custom Providers API docs.
Using Custom Providers
Once a provider is created and active, use the gateway exactly like you would with any built-in provider:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "csk_your_cencori_key",
baseURL: "https://api.cencori.com/v1",
});
// Route to your self-hosted model
const response = await client.chat.completions.create({
model: "llama3.1",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
user: "user_123", // optional: enables per-user billing
});Streaming works the same way:
const stream = await client.chat.completions.create({
model: "llama3.1",
messages: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}What Custom Providers Get
Everything that built-in providers get:
| Feature | How it works |
|---|---|
| Security filtering | PII detection and prompt injection blocking run on every request before it reaches your model |
| Rate limiting | Project-level and per-key limits enforced at the gateway |
| End-user billing | Pass user: "..." to meter, limit, and charge per user |
| Observability | Every request logged with model, latency, tokens, cost, status, user |
| Semantic caching | Repeated queries served from cache if enabled |
| Environment scoping | Test-key and production-key traffic tracked separately |
| Audit logging | Provider create/update/delete actions logged |
API Key Encryption
API keys for custom providers are encrypted at rest using AES-256-GCM, scoped to the organization ID. The key is decrypted only at request time when the gateway forwards traffic to your provider. You cannot retrieve the original key from the dashboard — only replace it.
If your model server doesn't require authentication (e.g., Ollama running locally), leave the API key field empty.
Common Server Setup
| Server | Install | Default endpoint |
|---|---|---|
| Ollama | ollama serve && ollama pull llama3.1 | http://localhost:11434/v1 |
| vLLM | vllm serve meta-llama/Llama-3.1-8B-Instruct | http://localhost:8000/v1 |
| LM Studio | Download from lmstudio.ai | http://localhost:1234/v1 |
For production, deploy on a cloud VM with a public IP or domain. For testing, use a tunnel like ngrok to expose your local server.
Troubleshooting
Test connection fails with timeout
- Is the server running and reachable from the internet?
- Is the base URL correct? It should end at
/v1, not include/chat/completions. - If using ngrok, is the tunnel still active?
Requests return "model not found"
- Is the provider active? Check the status badge in the dashboard.
- Does the model name in your request match a registered model name exactly (case-insensitive)?
- If using Tier 3 routing (
provider/model), does the provider name match?
Requests fail with decryption error
- This happens if the organization ID changed or the encryption key rotated. Delete the provider and re-create it with the API key.
Model works directly but fails through Cencori
- Check that the base URL doesn't include a trailing slash.
- Check that the API format matches what your server implements (OpenAI vs Anthropic).
Related
- Custom Providers API — full API reference
- Blog: Connect a Local LLM to Cencori — step-by-step walkthrough
- BYOK — use your own API keys with built-in providers
- End-User Billing — meter and charge your users
- Core Architecture — how the gateway pipeline works