Content Filtering ensures your application doesn't generate toxic, harmful, or brand-damaging content. Cencori unifies moderation across all providers.

Categories

We map provider-specific safety settings to a unified schema:

Category	Definition
Hate	Content that expresses, incites, or promotes hate based on identity.
Violence	Content that depicts death, violence, or physical injury.
Self-Harm	Content that encourages self-mutilation or suicide.
Sexual	Content meant to arouse sexual excitement or depicts sexual violence.

Thresholds

You can tune the sensitivity for each category. Scores range from 0.0 (Safe) to 1.0 (Harmful).

Low (0.8): Only blocks extreme content.
Medium (0.5): Default. Blocks clear violations.
High (0.2): Strict. Blocks ambiguous content.

Webhooks & Alerts

Receive real-time notifications when your application generates flagged content.

Go to Dashboard > Settings > Webhooks.
Add an endpoint URL (e.g., your Slack webhook or pagerduty).
Subscribe to security.content_flagged.

Payload Example:

Codetext

{
  "event": "security.content_flagged",
  "project_id": "proj_123",
  "timestamp": "2024-03-20T10:00:00Z",
  "data": {
    "prompt": "...",
    "category": "violence",
    "score": 0.95,
    "user_id": "user_456"
  }
}

Moderation Endpoint

You can also use the standalone Moderation API to check text without generating a response:

Codetext

const result = await cencori.ai.moderation({
  input: "Text to check"
});