Guardrails
Configure safety features to protect your ZenSearch deployment from harmful inputs and outputs.
Overview
Guardrails provide:
- Input validation and filtering
- Output safety checks
- Content moderation
- Hallucination detection
Accessing Guardrails
- Click Settings in the sidebar
- Select the Guardrails tab
note
Guardrail configuration requires Admin or Owner role.
Input Guardrails
Protect against harmful or inappropriate inputs.
Content Moderation
| Setting | Description |
|---|---|
| Enabled | Enable content moderation |
| Strictness | Low, Medium, High |
Filters for:
- Hate speech
- Violence
- Adult content
- Harassment
Prompt Injection Detection
| Setting | Description |
|---|---|
| Enabled | Detect injection attempts |
| Sensitivity | Detection sensitivity |
Protects against:
- Jailbreak attempts
- System prompt extraction
- Role confusion attacks
PII Detection
| Setting | Description |
|---|---|
| Enabled | Detect PII in queries |
| Action | Warn, Block, or Redact |
Detects:
- Email addresses
- Phone numbers
- Social security numbers
- Credit card numbers
Length Validation
| Setting | Description |
|---|---|
| Max Query Length | Maximum characters |
| Max File Size | Maximum upload size |
Output Guardrails
Ensure AI responses are safe and accurate.
Hallucination Detection
| Method | Description |
|---|---|
| Lexical | Word overlap checking |
| Semantic | Meaning comparison |
| Hybrid | Combined approach |
Settings:
- Detection threshold
- Action on detection (Warn/Block)
Toxicity Filtering
| Setting | Description |
|---|---|
| Enabled | Filter toxic outputs |
| Threshold | Toxicity score threshold |
Relevance Checking
| Setting | Description |
|---|---|
| Enabled | Check response relevance |
| Threshold | Minimum relevance score |
Ensures responses address the query.
Configuration
Enable/Disable Guardrails
Toggle individual guardrails:
┌─────────────────────────────────────┐
│ Content Moderation [ON] [OFF] │
│ Prompt Injection [ON] [OFF] │
│ PII Detection [ON] [OFF] │
│ Hallucination Detection [ON] [OFF] │
│ Toxicity Filtering [ON] [OFF] │
└─────────────────────────────────────┘
Adjust Sensitivity
Each guardrail has sensitivity settings:
| Level | Behavior |
|---|---|
| Low | Minimal filtering |
| Medium | Balanced approach |
| High | Strict filtering |
Set Actions
Configure what happens when triggered:
| Action | Result |
|---|---|
| Allow | Log but allow |
| Warn | Show warning to user |
| Block | Prevent the request |
| Redact | Remove sensitive content |
Monitoring
Guardrail Events
View triggered guardrails:
- Event timestamp
- Guardrail type
- Trigger reason
- Action taken
Analytics
Track guardrail metrics:
- Trigger frequency
- False positive rate
- Most common violations
Best Practices
Configuration
- Start with default settings
- Monitor false positives
- Adjust sensitivity gradually
- Review events regularly
Security
- Enable all input guardrails
- Use hallucination detection
- Monitor for new attack patterns
- Keep guardrails updated
User Experience
- Balance security with usability
- Provide helpful error messages
- Allow appeals for false positives
- Document guardrail policies
Troubleshooting
Too Many False Positives
- Lower sensitivity settings
- Review trigger patterns
- Add exceptions if needed
- Contact support for tuning
Guardrails Not Triggering
- Verify guardrails are enabled
- Check sensitivity settings
- Test with known bad inputs
- Review configuration