Overview

Lower latency: Responses can return much faster on cache hits.
Lower cost: Cached hits avoid an additional model generation call.
More consistency: Repeated prompts return stable results.

Semantic Caching is built into Cencori and runs automatically for eligible requests.

When a new request is close to one your project has already asked, Cencori can return a cached response instead of calling the upstream model again.

When Cache Applies

Semantic cache is currently applied to:

No cache toggle is required in the dashboard today. Caching is automatic.

To improve cache hit rates:

You can verify the cache status of any response using the X-Cencori-Cache header:

Value	Description
`HIT`	Served from cache based on an exact repeat request.
`SEMANTIC-HIT`	Served from cache based on similar request meaning.
`MISS`	Served from model generation and then cached.

Cached responses are retained for 1 hour by default.