Prompt Injection Protection
Understand and prevent prompt injection attacks that attempt to manipulate AI behavior.
What is Prompt Injection?
Prompt injection is a security vulnerability where malicious users craft inputs that manipulate an AI model's behavior, bypassing intended restrictions or extracting sensitive information.
Example Attack:
Without protection, the AI might comply and expose sensitive context.
Common Attack Vectors
1. Instruction Override
Attempts to replace system instructions:
2. Context Extraction
Tries to reveal hidden prompts:
3. Jailbreak Attempts
Bypasses safety guardrails:
4. Delimiter Confusion
Uses special characters to break context:
How Cencori Protects Against Prompt Injection
1. Pattern Detection
Cencori scans inputs for known malicious patterns like "ignore previous instructions", "system prompt", "jailbreak", and common attack keywords.
2. Semantic Analysis
Uses ML models to detect inputs that semantically resemble instruction overrides, even if they use novel phrasing.
3. Character Anomaly Detection
Flags inputs with suspicious delimiter usage, excessive special characters, or unusual formatting.
4. Behavioral Scoring
Assigns a risk score to each request. High-risk requests are blocked automatically.
When Injection is Detected
If prompt injection is detected, Cencori blocks the request before it reaches the AI provider:
Handling Injection Detection
Handling False Positives
Legitimate inputs might occasionally trigger detection:
Example: Technical Discussion
"How do I ignore previous context in my chatbot implementation?"
This is a legitimate question but contains the trigger phrase "ignore previous".
If you experience false positives:
- View the incident in your dashboard to see the risk score
- Adjust sensitivity settings (Enterprise feature)
- Whitelist specific patterns for your use case
Best Practices for Developers
- Never trust user input: Always send it through Cencori for scanning
- Separate system prompts: Use the
systemrole, never concatenate with user input - Monitor incidents: Review blocked injection attempts weekly
- Educate users: Make it clear that manipulative prompts won't work
- Test your defenses: Try common attacks in test environment
Testing Prompt Injection Protection
Try these sample attacks in your test environment to verify protection:

