Skip to main content

Guardrails

Configure safety features to protect your ZenSearch deployment from harmful inputs and outputs.

Overview

Guardrails provide:

  • Input validation and filtering
  • Output safety checks
  • Content moderation
  • Hallucination detection

Accessing Guardrails

  1. Click Settings in the sidebar
  2. Select the Guardrails tab
note

Guardrail configuration requires Admin or Owner role.

Input Guardrails

Protect against harmful or inappropriate inputs.

Content Moderation

SettingDescription
EnabledEnable content moderation
StrictnessLow, Medium, High

Filters for:

  • Hate speech
  • Violence
  • Adult content
  • Harassment

Prompt Injection Detection

SettingDescription
EnabledDetect injection attempts
SensitivityDetection sensitivity

Protects against:

  • Jailbreak attempts
  • System prompt extraction
  • Role confusion attacks

PII Detection

SettingDescription
EnabledDetect PII in queries
ActionWarn, Block, or Redact

Detects:

  • Email addresses
  • Phone numbers
  • Social security numbers
  • Credit card numbers

Length Validation

SettingDescription
Max Query LengthMaximum characters
Max File SizeMaximum upload size

Output Guardrails

Ensure AI responses are safe and accurate.

Hallucination Detection

MethodDescription
LexicalWord overlap checking
SemanticMeaning comparison
HybridCombined approach

Settings:

  • Detection threshold
  • Action on detection (Warn/Block)

Toxicity Filtering

SettingDescription
EnabledFilter toxic outputs
ThresholdToxicity score threshold

Relevance Checking

SettingDescription
EnabledCheck response relevance
ThresholdMinimum relevance score

Ensures responses address the query.

Configuration

Enable/Disable Guardrails

Toggle individual guardrails:

┌─────────────────────────────────────┐
│ Content Moderation [ON] [OFF] │
│ Prompt Injection [ON] [OFF] │
│ PII Detection [ON] [OFF] │
│ Hallucination Detection [ON] [OFF] │
│ Toxicity Filtering [ON] [OFF] │
└─────────────────────────────────────┘

Adjust Sensitivity

Each guardrail has sensitivity settings:

LevelBehavior
LowMinimal filtering
MediumBalanced approach
HighStrict filtering

Set Actions

Configure what happens when triggered:

ActionResult
AllowLog but allow
WarnShow warning to user
BlockPrevent the request
RedactRemove sensitive content

Monitoring

Guardrail Events

View triggered guardrails:

  • Event timestamp
  • Guardrail type
  • Trigger reason
  • Action taken

Analytics

Track guardrail metrics:

  • Trigger frequency
  • False positive rate
  • Most common violations

Best Practices

Configuration

  1. Start with default settings
  2. Monitor false positives
  3. Adjust sensitivity gradually
  4. Review events regularly

Security

  1. Enable all input guardrails
  2. Use hallucination detection
  3. Monitor for new attack patterns
  4. Keep guardrails updated

User Experience

  1. Balance security with usability
  2. Provide helpful error messages
  3. Allow appeals for false positives
  4. Document guardrail policies

Troubleshooting

Too Many False Positives

  1. Lower sensitivity settings
  2. Review trigger patterns
  3. Add exceptions if needed
  4. Contact support for tuning

Guardrails Not Triggering

  1. Verify guardrails are enabled
  2. Check sensitivity settings
  3. Test with known bad inputs
  4. Review configuration

Next Steps