Skip to main content

Data Connectors Overview

ZenSearch supports 17 data connectors across cloud storage, collaboration tools, CRM systems, databases, and web content. Connect your organization's data sources to create a unified, searchable knowledge base.

Connector Categories

Cloud Storage

Store and manage files in cloud platforms:

ConnectorDescriptionAuth Method
Amazon S3S3 and S3-compatible storage (MinIO)IAM Role, Access Keys
Google DrivePersonal and Workspace drivesOAuth 2.0, Service Account
SharePointMicrosoft SharePoint OnlineOAuth 2.0
Azure BlobAzure Blob Storage containersConnection String, SAS, Account Key, Managed Identity

Collaboration Tools

Connect team collaboration platforms:

ConnectorDescriptionAuth Method
ConfluenceAtlassian Confluence (Cloud/Server)API Token, OAuth
NotionPages, databases, and blocksOAuth 2.0, Integration Token
SlackChannels, threads, and filesOAuth 2.0

Development Tools

Index code and project management:

ConnectorDescriptionAuth Method
GitHubRepositories, code, and issuesOAuth 2.0, PAT
JiraIssues, comments, and worklogsAPI Token, OAuth

CRM Systems

Connect customer relationship management platforms:

ConnectorDescriptionAuth Method
SalesforceAccounts, contacts, opportunitiesOAuth 2.0
HubSpotContacts, companies, dealsOAuth 2.0
SAPS/4HANA ERP dataBasic Auth, OAuth

Databases

Query structured data using natural language via AI agents:

ConnectorDescriptionAuth Method
PostgreSQLPostgreSQL databasesUsername/Password
MySQLMySQL and MariaDBUsername/Password
ClickHouseClickHouse analytics DBUsername/Password
MS SQL ServerMicrosoft SQL ServerSQL/Windows Auth
info

Database connectors work differently from other connectors. Instead of ingesting documents into the search index, they connect databases to the AI Agent system. Agents can discover schemas, generate SQL queries from natural language, and return structured results. See the Agents documentation for details.

Web Content

Crawl and index websites:

ConnectorDescriptionAuth Method
Web CrawlerWebsite crawlingNone/Basic Auth

Agent Tool Integrations

In addition to data connectors that ingest content into the search index, ZenSearch provides agent tool integrations that let AI agents take actions in external platforms. These are not data collectors — they give agents the ability to read, create, and modify content in third-party services during a conversation.

For full details, see the Integrations page.

Productivity Suites

IntegrationToolsAuth Method
Google Workspace33 tools: Gmail, Calendar, Drive, Docs, Sheets, FormsOAuth 2.0, Service Account
Microsoft 36533+ tools: OneDrive, Outlook, Teams, SharePoint, PlannerOAuth 2.0 (Azure AD)

Business Platforms

IntegrationToolsAuth Method
Zendesk14 tools: Tickets, help center articles, users, organizationsOAuth 2.0
Airtable8 tools: Bases, records, commentsAPI Key
Notion12 tools: Pages, databases, comments, usersAPI Key

Custom & Extensible

IntegrationDescription
Custom Webhook ToolsCreate tools that call your own endpoints. Auth options: none, API key, basic auth, custom headers
MCP ServersConnect external tool servers following the Model Context Protocol standard
info

Agent tool integrations are distinct from data connectors. Data connectors ingest and index content for search. Agent tool integrations let agents interact with external services in real time during conversations. Some platforms (like Notion) appear in both categories — as a data connector for indexing pages, and as an agent tool integration for creating and editing pages.

Common Configuration

Basic Settings

All connectors share common configuration options:

SettingDescription
NameDisplay name for the connector
CollectionTarget collection for documents
EnabledWhether sync is active
ScheduleSync frequency (if applicable)

Authentication

OAuth Connectors

For OAuth-based connectors (Google Drive, Notion, Slack, etc.):

  1. Click Authorize during setup
  2. Sign in to the external service
  3. Grant requested permissions
  4. Return to ZenSearch automatically

API Key/Token

For API key-based connectors:

  1. Generate an API key in the source platform
  2. Enter the key in ZenSearch
  3. Test the connection

Username/Password

For database and basic auth connectors:

  1. Enter host and port
  2. Provide username and password
  3. Configure SSL if needed
  4. Test the connection

Adding a Connector

Step-by-Step

  1. Navigate to KnowledgeData Sources
  2. Click Add Data Source
  3. Select connector type
  4. Configure authentication
  5. Set source-specific options
  6. Choose target collection
  7. Test connection
  8. Create connector

Configuration Wizard

The wizard guides you through:

┌─────────────────────────────────────────┐
│ Step 1: Select Connector Type │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ S3 │ │Drive│ │Slack│ │ ... │ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
├─────────────────────────────────────────┤
│ Step 2: Configure Connection │
│ [Authentication fields...] │
├─────────────────────────────────────────┤
│ Step 3: Source Options │
│ [Filters, paths, scopes...] │
├─────────────────────────────────────────┤
│ Step 4: Collection Assignment │
│ [Select or create collection] │
└─────────────────────────────────────────┘

Sync Behavior

Initial Sync

When a connector is created:

  1. Full content fetch from source
  2. Document parsing and analysis
  3. Embedding generation
  4. Index population

Incremental Sync

Subsequent syncs are optimized to process only what has changed:

  • Content dedup: Unchanged documents are automatically skipped based on content hashing — if a document hasn't changed since the last sync, it won't be reprocessed
  • Selective processing: Only new and modified content flows through the parsing, embedding, and indexing pipeline, significantly reducing sync time for large data sources
  • Sync statistics: Each sync job reports detailed counts of added, modified, deleted, unchanged, and errored documents so you can verify exactly what changed

Webhook Sync (Real-time)

For supported connectors:

  1. Configure webhooks in source
  2. Receive instant update notifications
  3. Process changes immediately
  4. Near real-time search updates

Scheduled Sync

Connectors can be configured to sync automatically on a recurring schedule, keeping your knowledge base up to date without manual intervention.

Configure scheduled sync via the doc_sync settings on a connector:

SettingDescriptionOptions
sync_strategyHow syncs are triggered"manual" (default), "scheduled"
schedule_intervalTime between syncs"15m", "30m", "1h", "2h", "4h", "6h" (default), "12h", "24h"
# Enable scheduled sync with 1-hour interval
curl -X PATCH /api/v1/connectors/{id} \
-d '{"doc_sync": {"sync_strategy": "scheduled", "schedule_interval": "1h"}}'
info

Database connectors (PostgreSQL, MySQL, ClickHouse, MS SQL) use schema sync scheduling instead of document sync scheduling, since they don't ingest documents into the search index.

Deletion Detection

When a sync completes, ZenSearch compares the documents seen in the current run against previously indexed documents. Documents no longer present in the source are automatically soft-deleted from the search index.

A safety threshold of 50% prevents accidental mass deletion — if more than half of the existing documents would be deleted in a single sync, the operation is blocked and a warning is logged. This protects against scenarios like temporary API outages or misconfigured filters that could cause an empty sync result.

Deletion detection is supported by all document connectors except Slack and Web Crawler, which don't have bounded source enumeration (i.e., there's no reliable way to determine the complete set of documents that should exist).

Sync Statistics

Each sync job tracks detailed statistics:

MetricDescription
AddedNew documents indexed for the first time
ModifiedExisting documents that were updated
DeletedDocuments removed because they no longer exist in the source
UnchangedDocuments skipped because content hasn't changed
ErrorsDocuments that failed to process

Statistics are available in the sync job metadata after completion and can be used to monitor connector health and data freshness.

Sync Modes

ZenSearch supports three sync modes that control how documents are selected and processed:

ModeDescriptionBest For
autoDefault behavior with smart idempotency. Skips recently processed documents (within 30 minutes).Scheduled syncs, normal operations
fullReprocesses ALL documents. Bypasses idempotency check.Complete reindex, troubleshooting, schema changes
incrementalProcesses ONLY specified files. Bypasses idempotency check.Webhook updates, targeted refreshes

When to Use Each Mode

Auto Mode (Default)

  • Regular scheduled background syncs
  • When you want efficient processing that skips unchanged content
  • Normal day-to-day operations
# API call (sync_mode defaults to "auto")
curl -X POST /api/v1/connectors/{id}/sync

Full Mode

  • After changing embedding models
  • When troubleshooting missing or corrupted data
  • Periodic complete refresh (e.g., weekly full reindex)
  • After schema or configuration changes
# Force complete resync
curl -X POST /api/v1/connectors/{id}/sync \
-d '{"sync_mode": "full"}'

Incremental Mode

  • Processing webhook notifications for specific files
  • Refreshing a known set of updated documents
  • Testing changes on specific files
# Sync specific files only
curl -X POST /api/v1/connectors/{id}/sync \
-d '{"sync_mode": "incremental", "file_filter": ["docs/guide.pdf", "docs/api.md"]}'

Idempotency Protection

Auto mode includes a 30-minute idempotency window that prevents redundant processing:

  • Protects against: Duplicate messages, overlapping scheduled syncs, retry storms
  • Behavior: Documents processed within the last 30 minutes are skipped
  • Bypassed by: full and incremental modes (explicit sync requests should always be honored)
info

If you trigger a sync and documents don't appear to update, try using sync_mode: "full" to bypass the idempotency check.

Supported Sync Methods by Connector

ConnectorScheduledIncrementalWebhooks
S3YesYesYes
Google DriveYesYesYes
SharePointYesYesYes
Azure BlobYesYesNo
ConfluenceYesYesYes
NotionYesYesNo
SlackYesYesYes
GitHubYesYesYes
JiraYesYesYes
SalesforceYesYesYes
HubSpotYesYesYes
SAPYesYesNo
PostgreSQLYesNoNo
MySQLYesNoNo
ClickHouseYesNoNo
MS SQLYesNoNo
Web CrawlerYesYesNo

Permission Sync

Supported Platforms

These connectors sync document-level permissions:

  • Google Drive: File and folder sharing
  • SharePoint: Site and document permissions
  • Confluence: Space and page restrictions
  • Salesforce: Record sharing rules
  • Slack: Channel membership

Enabling Permission Sync

Permission sync is controlled per connector using the include_permissions configuration option. When enabled, ZenSearch imports access control rules from the source platform and enforces them at search time.

Permission Modes

ModeDescription
StrictOnly show documents user can access in source
PermissiveShow all documents (for internal use)

Content Filtering

Path/Prefix Filters

For storage connectors:

Include: /documents/public/*
Exclude: /documents/archive/*

File Type Filters

Limit by file extension:

Include: .pdf, .docx, .txt, .md
Exclude: .exe, .zip

Date Filters

Sync content from specific periods:

Modified after: 2024-01-01
Created within: Last 90 days

Custom Filters

Connector-specific filtering:

  • Slack: Specific channels
  • Confluence: Selected spaces
  • GitHub: Specific branches
  • Jira: JQL filter queries

Best Practices

Security

  1. Use minimal required permissions
  2. Rotate API keys regularly
  3. Use OAuth when available
  4. Enable permission sync

Performance

  1. Start with smaller scopes
  2. Enable incremental sync
  3. Use webhooks when available
  4. Filter unnecessary content

Organization

  1. One connector per purpose
  2. Group related connectors in collections
  3. Use descriptive names
  4. Document filter settings

Troubleshooting

Connection Failed

  1. Verify credentials are correct
  2. Check network connectivity
  3. Ensure required permissions granted
  4. Review firewall/proxy settings

Sync Errors

  1. Check source accessibility
  2. Verify API rate limits
  3. Review error messages
  4. Try manual re-sync

Missing Content

  1. Check filter settings
  2. Verify permissions
  3. Wait for sync completion
  4. Review content types supported

Next Steps

Choose a connector guide for detailed setup instructions: