Data Connectors Overview
ZenSearch supports 17 data connectors across cloud storage, collaboration tools, CRM systems, databases, and web content. Connect your organization's data sources to create a unified, searchable knowledge base.
Connector Categories
Cloud Storage
Store and manage files in cloud platforms:
| Connector | Description | Auth Method |
|---|---|---|
| Amazon S3 | S3 and S3-compatible storage (MinIO) | IAM Role, Access Keys |
| Google Drive | Personal and Workspace drives | OAuth 2.0, Service Account |
| SharePoint | Microsoft SharePoint Online | OAuth 2.0 |
| Azure Blob | Azure Blob Storage containers | Connection String, SAS, Account Key, Managed Identity |
Collaboration Tools
Connect team collaboration platforms:
| Connector | Description | Auth Method |
|---|---|---|
| Confluence | Atlassian Confluence (Cloud/Server) | API Token, OAuth |
| Notion | Pages, databases, and blocks | OAuth 2.0, Integration Token |
| Slack | Channels, threads, and files | OAuth 2.0 |
Development Tools
Index code and project management:
| Connector | Description | Auth Method |
|---|---|---|
| GitHub | Repositories, code, and issues | OAuth 2.0, PAT |
| Jira | Issues, comments, and worklogs | API Token, OAuth |
CRM Systems
Connect customer relationship management platforms:
| Connector | Description | Auth Method |
|---|---|---|
| Salesforce | Accounts, contacts, opportunities | OAuth 2.0 |
| HubSpot | Contacts, companies, deals | OAuth 2.0 |
| SAP | S/4HANA ERP data | Basic Auth, OAuth |
Databases
Query structured data using natural language via AI agents:
| Connector | Description | Auth Method |
|---|---|---|
| PostgreSQL | PostgreSQL databases | Username/Password |
| MySQL | MySQL and MariaDB | Username/Password |
| ClickHouse | ClickHouse analytics DB | Username/Password |
| MS SQL Server | Microsoft SQL Server | SQL/Windows Auth |
Database connectors work differently from other connectors. Instead of ingesting documents into the search index, they connect databases to the AI Agent system. Agents can discover schemas, generate SQL queries from natural language, and return structured results. See the Agents documentation for details.
Web Content
Crawl and index websites:
| Connector | Description | Auth Method |
|---|---|---|
| Web Crawler | Website crawling | None/Basic Auth |
Agent Tool Integrations
In addition to data connectors that ingest content into the search index, ZenSearch provides agent tool integrations that let AI agents take actions in external platforms. These are not data collectors — they give agents the ability to read, create, and modify content in third-party services during a conversation.
For full details, see the Integrations page.
Productivity Suites
| Integration | Tools | Auth Method |
|---|---|---|
| Google Workspace | 33 tools: Gmail, Calendar, Drive, Docs, Sheets, Forms | OAuth 2.0, Service Account |
| Microsoft 365 | 33+ tools: OneDrive, Outlook, Teams, SharePoint, Planner | OAuth 2.0 (Azure AD) |
Business Platforms
| Integration | Tools | Auth Method |
|---|---|---|
| Zendesk | 14 tools: Tickets, help center articles, users, organizations | OAuth 2.0 |
| Airtable | 8 tools: Bases, records, comments | API Key |
| Notion | 12 tools: Pages, databases, comments, users | API Key |
Custom & Extensible
| Integration | Description |
|---|---|
| Custom Webhook Tools | Create tools that call your own endpoints. Auth options: none, API key, basic auth, custom headers |
| MCP Servers | Connect external tool servers following the Model Context Protocol standard |
Agent tool integrations are distinct from data connectors. Data connectors ingest and index content for search. Agent tool integrations let agents interact with external services in real time during conversations. Some platforms (like Notion) appear in both categories — as a data connector for indexing pages, and as an agent tool integration for creating and editing pages.
Common Configuration
Basic Settings
All connectors share common configuration options:
| Setting | Description |
|---|---|
| Name | Display name for the connector |
| Collection | Target collection for documents |
| Enabled | Whether sync is active |
| Schedule | Sync frequency (if applicable) |
Authentication
OAuth Connectors
For OAuth-based connectors (Google Drive, Notion, Slack, etc.):
- Click Authorize during setup
- Sign in to the external service
- Grant requested permissions
- Return to ZenSearch automatically
API Key/Token
For API key-based connectors:
- Generate an API key in the source platform
- Enter the key in ZenSearch
- Test the connection
Username/Password
For database and basic auth connectors:
- Enter host and port
- Provide username and password
- Configure SSL if needed
- Test the connection
Adding a Connector
Step-by-Step
- Navigate to Knowledge → Data Sources
- Click Add Data Source
- Select connector type
- Configure authentication
- Set source-specific options
- Choose target collection
- Test connection
- Create connector
Configuration Wizard
The wizard guides you through:
┌─────────────────────────────────────────┐
│ Step 1: Select Connector Type │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ S3 │ │Drive│ │Slack│ │ ... │ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
├─────────────────────────────────────────┤
│ Step 2: Configure Connection │
│ [Authentication fields...] │
├─────────────────────────────────────────┤
│ Step 3: Source Options │
│ [Filters, paths, scopes...] │
├─────────────────────────────────────────┤
│ Step 4: Collection Assignment │
│ [Select or create collection] │
└─────────────────────────────────────────┘
Sync Behavior
Initial Sync
When a connector is created:
- Full content fetch from source
- Document parsing and analysis
- Embedding generation
- Index population
Incremental Sync
Subsequent syncs are optimized to process only what has changed:
- Content dedup: Unchanged documents are automatically skipped based on content hashing — if a document hasn't changed since the last sync, it won't be reprocessed
- Selective processing: Only new and modified content flows through the parsing, embedding, and indexing pipeline, significantly reducing sync time for large data sources
- Sync statistics: Each sync job reports detailed counts of added, modified, deleted, unchanged, and errored documents so you can verify exactly what changed
Webhook Sync (Real-time)
For supported connectors:
- Configure webhooks in source
- Receive instant update notifications
- Process changes immediately
- Near real-time search updates
Scheduled Sync
Connectors can be configured to sync automatically on a recurring schedule, keeping your knowledge base up to date without manual intervention.
Configure scheduled sync via the doc_sync settings on a connector:
| Setting | Description | Options |
|---|---|---|
sync_strategy | How syncs are triggered | "manual" (default), "scheduled" |
schedule_interval | Time between syncs | "15m", "30m", "1h", "2h", "4h", "6h" (default), "12h", "24h" |
# Enable scheduled sync with 1-hour interval
curl -X PATCH /api/v1/connectors/{id} \
-d '{"doc_sync": {"sync_strategy": "scheduled", "schedule_interval": "1h"}}'
Database connectors (PostgreSQL, MySQL, ClickHouse, MS SQL) use schema sync scheduling instead of document sync scheduling, since they don't ingest documents into the search index.
Deletion Detection
When a sync completes, ZenSearch compares the documents seen in the current run against previously indexed documents. Documents no longer present in the source are automatically soft-deleted from the search index.
A safety threshold of 50% prevents accidental mass deletion — if more than half of the existing documents would be deleted in a single sync, the operation is blocked and a warning is logged. This protects against scenarios like temporary API outages or misconfigured filters that could cause an empty sync result.
Deletion detection is supported by all document connectors except Slack and Web Crawler, which don't have bounded source enumeration (i.e., there's no reliable way to determine the complete set of documents that should exist).
Sync Statistics
Each sync job tracks detailed statistics:
| Metric | Description |
|---|---|
| Added | New documents indexed for the first time |
| Modified | Existing documents that were updated |
| Deleted | Documents removed because they no longer exist in the source |
| Unchanged | Documents skipped because content hasn't changed |
| Errors | Documents that failed to process |
Statistics are available in the sync job metadata after completion and can be used to monitor connector health and data freshness.
Sync Modes
ZenSearch supports three sync modes that control how documents are selected and processed:
| Mode | Description | Best For |
|---|---|---|
| auto | Default behavior with smart idempotency. Skips recently processed documents (within 30 minutes). | Scheduled syncs, normal operations |
| full | Reprocesses ALL documents. Bypasses idempotency check. | Complete reindex, troubleshooting, schema changes |
| incremental | Processes ONLY specified files. Bypasses idempotency check. | Webhook updates, targeted refreshes |
When to Use Each Mode
Auto Mode (Default)
- Regular scheduled background syncs
- When you want efficient processing that skips unchanged content
- Normal day-to-day operations
# API call (sync_mode defaults to "auto")
curl -X POST /api/v1/connectors/{id}/sync
Full Mode
- After changing embedding models
- When troubleshooting missing or corrupted data
- Periodic complete refresh (e.g., weekly full reindex)
- After schema or configuration changes
# Force complete resync
curl -X POST /api/v1/connectors/{id}/sync \
-d '{"sync_mode": "full"}'
Incremental Mode
- Processing webhook notifications for specific files
- Refreshing a known set of updated documents
- Testing changes on specific files
# Sync specific files only
curl -X POST /api/v1/connectors/{id}/sync \
-d '{"sync_mode": "incremental", "file_filter": ["docs/guide.pdf", "docs/api.md"]}'
Idempotency Protection
Auto mode includes a 30-minute idempotency window that prevents redundant processing:
- Protects against: Duplicate messages, overlapping scheduled syncs, retry storms
- Behavior: Documents processed within the last 30 minutes are skipped
- Bypassed by:
fullandincrementalmodes (explicit sync requests should always be honored)
If you trigger a sync and documents don't appear to update, try using sync_mode: "full" to bypass the idempotency check.
Supported Sync Methods by Connector
| Connector | Scheduled | Incremental | Webhooks |
|---|---|---|---|
| S3 | Yes | Yes | Yes |
| Google Drive | Yes | Yes | Yes |
| SharePoint | Yes | Yes | Yes |
| Azure Blob | Yes | Yes | No |
| Confluence | Yes | Yes | Yes |
| Notion | Yes | Yes | No |
| Slack | Yes | Yes | Yes |
| GitHub | Yes | Yes | Yes |
| Jira | Yes | Yes | Yes |
| Salesforce | Yes | Yes | Yes |
| HubSpot | Yes | Yes | Yes |
| SAP | Yes | Yes | No |
| PostgreSQL | Yes | No | No |
| MySQL | Yes | No | No |
| ClickHouse | Yes | No | No |
| MS SQL | Yes | No | No |
| Web Crawler | Yes | Yes | No |
Permission Sync
Supported Platforms
These connectors sync document-level permissions:
- Google Drive: File and folder sharing
- SharePoint: Site and document permissions
- Confluence: Space and page restrictions
- Salesforce: Record sharing rules
- Slack: Channel membership
Enabling Permission Sync
Permission sync is controlled per connector using the include_permissions configuration option. When enabled, ZenSearch imports access control rules from the source platform and enforces them at search time.
Permission Modes
| Mode | Description |
|---|---|
| Strict | Only show documents user can access in source |
| Permissive | Show all documents (for internal use) |
Content Filtering
Path/Prefix Filters
For storage connectors:
Include: /documents/public/*
Exclude: /documents/archive/*
File Type Filters
Limit by file extension:
Include: .pdf, .docx, .txt, .md
Exclude: .exe, .zip
Date Filters
Sync content from specific periods:
Modified after: 2024-01-01
Created within: Last 90 days
Custom Filters
Connector-specific filtering:
- Slack: Specific channels
- Confluence: Selected spaces
- GitHub: Specific branches
- Jira: JQL filter queries
Best Practices
Security
- Use minimal required permissions
- Rotate API keys regularly
- Use OAuth when available
- Enable permission sync
Performance
- Start with smaller scopes
- Enable incremental sync
- Use webhooks when available
- Filter unnecessary content
Organization
- One connector per purpose
- Group related connectors in collections
- Use descriptive names
- Document filter settings
Troubleshooting
Connection Failed
- Verify credentials are correct
- Check network connectivity
- Ensure required permissions granted
- Review firewall/proxy settings
Sync Errors
- Check source accessibility
- Verify API rate limits
- Review error messages
- Try manual re-sync
Missing Content
- Check filter settings
- Verify permissions
- Wait for sync completion
- Review content types supported
Next Steps
Choose a connector guide for detailed setup instructions: