Docs/Security

Security

Content Filtering

Last updated March 3, 2026

Filter harmful output from AI models. Configure thresholds for hate, violence, and self-harm.

Content Filtering ensures your application doesn't generate toxic, harmful, or brand-damaging content. Cencori unifies moderation across all providers.

Categories

We map provider-specific safety settings to a unified schema:

CategoryDefinition
HateContent that expresses, incites, or promotes hate based on identity.
ViolenceContent that depicts death, violence, or physical injury.
Self-HarmContent that encourages self-mutilation or suicide.
SexualContent meant to arouse sexual excitement or depicts sexual violence.

Thresholds

You can tune the sensitivity for each category. Scores range from 0.0 (Safe) to 1.0 (Harmful).

  • Low (0.8): Only blocks extreme content.
  • Medium (0.5): Default. Blocks clear violations.
  • High (0.2): Strict. Blocks ambiguous content.

Webhooks & Alerts

Receive real-time notifications when your application generates flagged content.

  1. Go to Dashboard > Settings > Webhooks.
  2. Add an endpoint URL (e.g., your Slack webhook or pagerduty).
  3. Subscribe to security.content_flagged.

Payload Example:

Codetext
{
  "event": "security.content_flagged",
  "project_id": "proj_123",
  "timestamp": "2024-03-20T10:00:00Z",
  "data": {
    "prompt": "...",
    "category": "violence",
    "score": 0.95,
    "user_id": "user_456"
  }
}

Moderation Endpoint

You can also use the standalone Moderation API to check text without generating a response:

Codetext
const result = await cencori.ai.moderation({
  input: "Text to check"
});