Model Routing Is the Easy Part

23 April 20264 min read
Model Routing Is the Easy Part

The world didn't need another way to call GPT-4.

Between OpenRouter, LiteLLM, Portkey, and the Vercel AI SDK, the "unified API" problem has been solved for a while. If your goal is simply to send a prompt to three different providers using the same JSON format, you have excellent options.

But we didn't build Cencori to solve that problem.

We built Cencori because we realized that for every 10 hours a team spends getting a model to work, they spend 90 hours building the "boring" infrastructure around it: governance, end-user billing, security filters, and audit logs.

Cencori isn't a proxy. It's the backbone for companies that ship AI into production.

The Gap: Connectivity vs. Governance

When we started, we looked at the landscape. OpenRouter is brilliant for discovery and unified billing across providers. LiteLLM is the gold standard for a lightweight Python proxy. Portkey does a focused job on observability.

But as soon as you move from a "demo" to a "deal" with a SOC2-compliant enterprise or a regulated startup, the conversation changes. It’s no longer about whether the model is fast; it’s about:

  • Can we block users from sending PII to the model?
  • How do we redact sensitive project secrets from the logs?
  • If a model hallucinates a password, can we catch it before the user sees it?

Existing tools treated security as a plugin or a post-request log. We realized security had to be middleware. It had to sit directly on the wire, inspecting every token in real-time, capable of killing a request in milliseconds if a data rule was breached.

The Monetization Wall

Every founder we talked to had the same spreadsheet. They were trying to figure out how to charge their users for AI without going bankrupt.

Building a billing system for AI is a nightmare. You have to track tokens per user, map them to specific provider prices (which change constantly), apply a markup, and then integrate that into Stripe or Polar.

If you use a standard gateway, you still have to build this entire billing engine yourself. We decided that billing is a first-class citizen of the gateway. Cencori doesn't just route the prompt; it knows who the user is, what their budget is, and how much to charge them for that specific generation—before the response even finishes streaming.

Why a "Stateless Proxy" isn't enough

The biggest technical bet we made is that the gateway needs to be "Heavy."

Most gateways pride themselves on being thin, stateless proxies. But the "Full Stack" AI problem requires state.

  • Memory: Why should your app have to manage conversation history across 10 different providers? The gateway should hold the context.
  • Security Incident Tracking: When a user tries to jailbreak your model, you don't just want a log. You want the gateway to update a security score for that user across all your LLM endpoints.
  • Reliability: It's not just "retry on error." It's "the primary provider is hitting a 429—automatically switch the system prompt to a smaller, faster model to keep the UI snappy."

Building this meant we couldn't just wrap an Express server. We had to build a global middleware layer that handles streaming, circuit breaking, and real-time data masking simultaneously. It was a technical headache, but it’s the difference between a "tool" and "infrastructure."

The Africa Factor

There is a final, more honest reason. Most of the world's AI infrastructure is built for teams in SF or London with unlimited compute budgets and simple compliance needs.

Cencori is built from Africa. This matters because we understand efficiency and localized compliance in a way that US-centric tools don't. We built Cencori to be the bridge for the next billion users—where a gateway isn't just about calling a model, it's about making AI economically viable and socially safe in entirely new markets.

Who is it for?

If you are a hobbyist looking for the cheapest way to try 10 models, use OpenRouter. They are great at it.

If you are a founder or an engineer building an application that will handle real customer data, real money, and real security requirements—that is why we built Cencori.

We didn't build it because there weren't enough LLM proxies. We built it because there wasn't enough intelligence infrastructure.

Now, back to the code.