Platform
Semantic Caching
Last updated March 3, 2026
Reduce latency and costs by caching identical AI responses.
Overview
Cencori implements Semantic Caching for AI completions. When enabled, the gateway checks if an identical request has been made recently. If a match is found, the cached response is returned immediately, bypassing the upstream AI provider.
Benefits
- Lower Latency: Cached responses are served in milliseconds, compared to seconds for fresh AI generations.
- Reduced Costs: You are not charged by the AI provider (e.g., OpenAI, Anthropic) for cached hits.
- Consistency: Identical inputs yield identical outputs, ensuring stability for testing and deterministic workflows.
How it Works
Cencori uses a two-layer caching architecture:
1. Exact Match (Redis)
First, we check for an exact character-for-character match using a SHA-256 hash. This is extremely fast (< 5ms) and handles identical repeats (e.g., retries).
- Key:
SHA256(Project + Model + Prompt + Params) - Storage: Redis (Upstash)
2. Semantic Match (Vector DB)
If the exact match fails, we perform a semantic search.
- The prompt is converted into a vector embedding (using Gemini/OpenAI embedding models).
- We query our Supabase Vector database for previous prompts with high cosine similarity (default threshold:
0.95). - If a match is found, the cached response is returned.
This allows "What is the capital of France?" to match a cached result for "Tell me Paris' location", saving cost and time.
[!NOTE] Currently, caching is enabled for all non-streaming requests by default. Streaming requests bypass the cache.
Cache Headers
You can verify the cache status of any response using the X-Cencori-Cache header:
| Value | Description |
|---|---|
HIT | The response was served from the cache. |
MISS | The response was generated by the AI provider (and is now cached). |
Retention (TTL)
Cached responses are stored for 1 hour by default. After this period, the cache expires, and the next request will be treated as a fresh generation (MISS).
Disabling Caching
Coming Soon: You will be able to disable caching per-request via a custom header or query parameter.