Security
Content Filtering
Last updated March 3, 2026
Filter harmful output from AI models. Configure thresholds for hate, violence, and self-harm.
Content Filtering ensures your application doesn't generate toxic, harmful, or brand-damaging content. Cencori unifies moderation across all providers.
Categories
We map provider-specific safety settings to a unified schema:
| Category | Definition |
|---|---|
| Hate | Content that expresses, incites, or promotes hate based on identity. |
| Violence | Content that depicts death, violence, or physical injury. |
| Self-Harm | Content that encourages self-mutilation or suicide. |
| Sexual | Content meant to arouse sexual excitement or depicts sexual violence. |
Thresholds
You can tune the sensitivity for each category. Scores range from 0.0 (Safe) to 1.0 (Harmful).
- Low (0.8): Only blocks extreme content.
- Medium (0.5): Default. Blocks clear violations.
- High (0.2): Strict. Blocks ambiguous content.
Webhooks & Alerts
Receive real-time notifications when your application generates flagged content.
- Go to Dashboard > Settings > Webhooks.
- Add an endpoint URL (e.g., your Slack webhook or pagerduty).
- Subscribe to
security.content_flagged.
Payload Example:
{
"event": "security.content_flagged",
"project_id": "proj_123",
"timestamp": "2024-03-20T10:00:00Z",
"data": {
"prompt": "...",
"category": "violence",
"score": 0.95,
"user_id": "user_456"
}
}Moderation Endpoint
You can also use the standalone Moderation API to check text without generating a response:
const result = await cencori.ai.moderation({
input: "Text to check"
});