Introducing Prompt Cache

05 May 20262 min read
Introducing Prompt Cache

Were excited to announce Prompt Cache, the most advanced caching solution for production AI applications. Built directly into the Cencori AI Gateway, it reduces your AI costs by 30-60% and latency by 95%.

Why Caching Matters

Every time your AI application sends the same prompt, you are paying full price. For applications with repetitive queries, classifiers, or RAG systems, this adds up fast. Most caching solutions only match identical prompts, but real applications often have slight variations that should still share a cached response.

Two-Tier Cache Architecture

Exact Match Cache gives less than 10ms latency for instant cache hits. It uses SHA-256 hash of normalized prompt, is case-insensitive and whitespace-normalized, and is Redis-backed for speed.

Semantic Match Cache gives about 50ms latency for similarity search via embeddings. It finds semantically similar prompts, has configurable similarity threshold, and uses text-embedding-004 for vector embeddings.

Smart Features

Temperature-Aware Caching only caches deterministic responses. Prompt Cache defaults to caching only low-temperature responses, preserving creativity while maximizing savings on predictable outputs.

Model Exclusion lets you exclude specific models from caching, like reasoning models or expensive fine-tuned models, while caching responses from cheaper models.

Full Analytics lets you track your ROI with built-in metrics including hit rate, tokens saved, cost saved, and active entries.

Getting Started

Enable Prompt Cache from your project settings. No code changes required. Toggle it on with one click.

Whats included: Exact plus Semantic caching, built-in analytics dashboard, temperature-aware logic, model exclusion controls, per-project configuration, and manual cache clearing.

Prompt Cache is available on all plans. You only pay for the underlying infrastructure.

Get started here