Data Connectors Overview

ZenSearch supports 18 data connectors across cloud storage, collaboration tools, CRM and ITSM systems, databases, and web content. Connect your organization's data sources to create a unified, searchable knowledge base.

Connector Categories

Cloud Storage

Store and manage files in cloud platforms:

Connector	Description	Auth Method
Amazon S3	S3 and S3-compatible storage (RustFS)	IAM Role, Access Keys
Google Drive	Personal and Workspace drives	OAuth 2.0, Service Account
SharePoint	Microsoft SharePoint Online	OAuth 2.0
Azure Blob	Azure Blob Storage containers	Connection String, SAS, Account Key, Managed Identity

Collaboration Tools

Connect team collaboration platforms:

Connector	Description	Auth Method
Confluence	Atlassian Confluence (Cloud/Server)	API Token, OAuth
Notion	Pages, databases, and blocks	OAuth 2.0, Integration Token
Slack	Channels, threads, and files	OAuth 2.0

Development Tools

Index code and project management:

Connector	Description	Auth Method
GitHub	Repositories, code, and issues	OAuth 2.0, PAT
Jira	Issues, comments, and worklogs	API Token, OAuth

CRM Systems

Connect customer relationship management platforms:

Connector	Description	Auth Method
Salesforce	Accounts, contacts, opportunities	OAuth 2.0
HubSpot	Contacts, companies, deals	OAuth 2.0
SAP	S/4HANA ERP data	Basic Auth, OAuth

ITSM

Connect IT service management platforms:

Connector	Description	Auth Method
ServiceNow	Knowledge base, incidents, changes, problems	OAuth 2.0, Basic Auth

Databases

Query structured data using natural language via AI agents:

Connector	Description	Auth Method
PostgreSQL	PostgreSQL databases	Username/Password
MySQL	MySQL and MariaDB	Username/Password
ClickHouse	ClickHouse analytics DB	Username/Password
MS SQL Server	Microsoft SQL Server	SQL/Windows Auth

info

Database connectors work differently from other connectors. Instead of ingesting documents into the search index, they connect databases to the AI Agent system. Agents can discover schemas, generate SQL queries from natural language, and return structured results. See the Agents documentation for details.

Web Content

Crawl and index websites:

Connector	Description	Auth Method
Web Crawler	Website crawling	None/Basic Auth

Agent Tool Integrations

In addition to data connectors that ingest content into the search index, ZenSearch provides agent tool integrations that let AI agents take actions in external platforms. These are not data collectors — they give agents the ability to read, create, and modify content in third-party services during a conversation.

For full details, see the Integrations page.

Productivity Suites

Integration	Tools	Auth Method
Google Workspace	33 tools: Gmail, Calendar, Drive, Docs, Sheets, Forms	OAuth 2.0, Service Account
Microsoft 365	33+ tools: OneDrive, Outlook, Teams, SharePoint, Planner	OAuth 2.0 (Azure AD)

Business Platforms

Integration	Tools	Auth Method
Zendesk	14 tools: Tickets, help center articles, users, organizations	OAuth 2.0
Airtable	8 tools: Bases, records, comments	API Key
Notion	12 tools: Pages, databases, comments, users	API Key

Custom & Extensible

Integration	Description
Custom Webhook Tools	Create tools that call your own endpoints. Auth options: none, API key, basic auth, custom headers
MCP Servers	Connect external tool servers following the Model Context Protocol standard

info

Agent tool integrations are distinct from data connectors. Data connectors ingest and index content for search. Agent tool integrations let agents interact with external services in real time during conversations. Some platforms (like Notion) appear in both categories — as a data connector for indexing pages, and as an agent tool integration for creating and editing pages.

Common Configuration

Basic Settings

All connectors share common configuration options:

Setting	Description
Name	Display name for the connector
Collection	Target collection for documents
Enabled	Whether sync is active
Schedule	Sync frequency (if applicable)

Authentication

OAuth Connectors

For OAuth-based connectors (Google Drive, Notion, Slack, etc.):

Click Authorize during setup
Sign in to the external service
Grant requested permissions
Return to ZenSearch automatically

API Key/Token

For API key-based connectors:

Generate an API key in the source platform
Enter the key in ZenSearch
Test the connection

Username/Password

For database and basic auth connectors:

Enter host and port
Provide username and password
Configure SSL if needed
Test the connection

Adding a Connector

Step-by-Step

Navigate to Knowledge → Data Sources
Click Add Data Source
Select connector type
Configure authentication
Set source-specific options
Choose target collection
Test connection
Create connector

Configuration Wizard

The wizard guides you through:

┌─────────────────────────────────────────┐
│  Step 1: Select Connector Type          │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐       │
│  │ S3  │ │Drive│ │Slack│ │ ... │       │
│  └─────┘ └─────┘ └─────┘ └─────┘       │
├─────────────────────────────────────────┤
│  Step 2: Configure Connection           │
│  [Authentication fields...]             │
├─────────────────────────────────────────┤
│  Step 3: Source Options                 │
│  [Filters, paths, scopes...]            │
├─────────────────────────────────────────┤
│  Step 4: Collection Assignment          │
│  [Select or create collection]          │
└─────────────────────────────────────────┘

Sync Behavior

Initial Sync

When a connector is created:

Full content fetch from source
Document parsing and analysis
Embedding generation
Index population

Incremental Sync

Subsequent syncs are optimized to process only what has changed:

Content dedup: Unchanged documents are automatically skipped based on content hashing — if a document hasn't changed since the last sync, it won't be reprocessed
Selective processing: Only new and modified content flows through the parsing, embedding, and indexing pipeline, significantly reducing sync time for large data sources
Sync statistics: Each sync job reports detailed counts of added, modified, deleted, unchanged, and errored documents so you can verify exactly what changed

Webhook Sync (Real-time)

For supported connectors:

Configure webhooks in source
Receive instant update notifications
Process changes immediately
Near real-time search updates

Scheduled Sync

Connectors can be configured to sync automatically on a recurring schedule, keeping your knowledge base up to date without manual intervention.

Configure scheduled sync via the doc_sync settings on a connector:

Setting	Description	Options
`sync_strategy`	How syncs are triggered	`"manual"` (default), `"scheduled"`
`schedule_interval`	Time between syncs	`"15m"`, `"30m"`, `"1h"`, `"2h"`, `"4h"`, `"6h"` (default), `"12h"`, `"24h"`

# Enable scheduled sync with 1-hour interval
curl -X PATCH /api/v1/connectors/{id} \
  -d '{"doc_sync": {"sync_strategy": "scheduled", "schedule_interval": "1h"}}'

info

Database connectors (PostgreSQL, MySQL, ClickHouse, MS SQL) use schema sync scheduling instead of document sync scheduling, since they don't ingest documents into the search index.

Deletion Detection

When a sync completes, ZenSearch compares the documents seen in the current run against previously indexed documents. Documents no longer present in the source are automatically soft-deleted from the search index.

A safety threshold of 50% prevents accidental mass deletion — if more than half of the existing documents would be deleted in a single sync, the operation is blocked and a warning is logged. This protects against scenarios like temporary API outages or misconfigured filters that could cause an empty sync result.

Deletion detection is supported by all document connectors except Slack and Web Crawler, which don't have bounded source enumeration (i.e., there's no reliable way to determine the complete set of documents that should exist).

Sync Statistics

Each sync job tracks detailed statistics:

Metric	Description
Added	New documents indexed for the first time
Modified	Existing documents that were updated
Deleted	Documents removed because they no longer exist in the source
Unchanged	Documents skipped because content hasn't changed
Errors	Documents that failed to process

Statistics are available in the sync job metadata after completion and can be used to monitor connector health and data freshness.

Sync Modes

ZenSearch supports three sync modes that control how documents are selected and processed:

Mode	Description	Best For
auto	Default behavior with smart idempotency. Skips recently processed documents (within 30 minutes).	Scheduled syncs, normal operations
full	Reprocesses ALL documents. Bypasses idempotency check.	Complete reindex, troubleshooting, schema changes
incremental	Processes ONLY specified files. Bypasses idempotency check.	Webhook updates, targeted refreshes

When to Use Each Mode

Auto Mode (Default)

Regular scheduled background syncs
When you want efficient processing that skips unchanged content
Normal day-to-day operations

# API call (sync_mode defaults to "auto")
curl -X POST /api/v1/connectors/{id}/sync

Full Mode

After changing embedding models
When troubleshooting missing or corrupted data
Periodic complete refresh (e.g., weekly full reindex)
After schema or configuration changes

# Force complete resync
curl -X POST /api/v1/connectors/{id}/sync \
  -d '{"sync_mode": "full"}'

Incremental Mode

Processing webhook notifications for specific files
Refreshing a known set of updated documents
Testing changes on specific files

# Sync specific files only
curl -X POST /api/v1/connectors/{id}/sync \
  -d '{"sync_mode": "incremental", "file_filter": ["docs/guide.pdf", "docs/api.md"]}'

Idempotency Protection

Auto mode includes a 30-minute idempotency window that prevents redundant processing:

Protects against: Duplicate messages, overlapping scheduled syncs, retry storms
Behavior: Documents processed within the last 30 minutes are skipped
Bypassed by: full and incremental modes (explicit sync requests should always be honored)

info

If you trigger a sync and documents don't appear to update, try using sync_mode: "full" to bypass the idempotency check.

Supported Sync Methods by Connector

Connector	Scheduled	Incremental	Webhooks
S3	Yes	Yes	Yes
Google Drive	Yes	Yes	Yes
SharePoint	Yes	Yes	Yes
Azure Blob	Yes	Yes	No
Confluence	Yes	Yes	Yes
Notion	Yes	Yes	No
Slack	Yes	Yes	Yes
GitHub	Yes	Yes	Yes
Jira	Yes	Yes	Yes
Salesforce	Yes	Yes	Yes
HubSpot	Yes	Yes	Yes
SAP	Yes	Yes	No
PostgreSQL	Yes	No	No
MySQL	Yes	No	No
ClickHouse	Yes	No	No
MS SQL	Yes	No	No
Web Crawler	Yes	Yes	No

Permission Sync

Supported Platforms

These connectors sync document-level permissions:

Google Drive: File and folder sharing
SharePoint: Site and document permissions
Confluence: Space and page restrictions
Salesforce: Record sharing rules
Slack: Channel membership

Enabling Permission Sync

Permission sync is controlled per connector using the include_permissions configuration option. When enabled, ZenSearch imports access control rules from the source platform and enforces them at search time.

Permission Modes

Mode	Description
Strict	Only show documents user can access in source
Permissive	Show all documents (for internal use)

Content Filtering

Path/Prefix Filters

For storage connectors:

Include: /documents/public/*
Exclude: /documents/archive/*

File Type Filters

Limit by file extension:

Include: .pdf, .docx, .txt, .md
Exclude: .exe, .zip

Date Filters

Sync content from specific periods:

Modified after: 2024-01-01
Created within: Last 90 days

Custom Filters

Connector-specific filtering:

Slack: Specific channels
Confluence: Selected spaces
GitHub: Specific branches
Jira: JQL filter queries

Best Practices

Security

Use minimal required permissions
Rotate API keys regularly
Use OAuth when available
Enable permission sync

Performance

Start with smaller scopes
Enable incremental sync
Use webhooks when available
Filter unnecessary content

Organization

One connector per purpose
Group related connectors in collections
Use descriptive names
Document filter settings

Troubleshooting

Connection Failed

Verify credentials are correct
Check network connectivity
Ensure required permissions granted
Review firewall/proxy settings

Sync Errors

Check source accessibility
Verify API rate limits
Review error messages
Try manual re-sync

Missing Content

Check filter settings
Verify permissions
Wait for sync completion
Review content types supported

Next Steps

Choose a connector guide for detailed setup instructions:

Connector Categories​

Cloud Storage​

Collaboration Tools​

Development Tools​

CRM Systems​

ITSM​

Databases​

Web Content​

Agent Tool Integrations​

Productivity Suites​

Business Platforms​

Custom & Extensible​

Common Configuration​

Basic Settings​

Authentication​

OAuth Connectors​

API Key/Token​

Username/Password​

Adding a Connector​

Step-by-Step​

Configuration Wizard​

Sync Behavior​

Initial Sync​

Incremental Sync​

Webhook Sync (Real-time)​

Scheduled Sync​

Deletion Detection​

Sync Statistics​

Sync Modes​

When to Use Each Mode​

Idempotency Protection​

Supported Sync Methods by Connector​

Permission Sync​

Supported Platforms​

Enabling Permission Sync​

Permission Modes​

Content Filtering​

Path/Prefix Filters​

File Type Filters​

Date Filters​

Custom Filters​

Best Practices​

Security​

Performance​

Organization​

Troubleshooting​

Connection Failed​

Sync Errors​

Missing Content​

Next Steps​

Connector Categories

Cloud Storage

Collaboration Tools

Development Tools

CRM Systems

ITSM

Databases

Web Content

Agent Tool Integrations

Productivity Suites

Business Platforms

Custom & Extensible

Common Configuration

Basic Settings

Authentication

OAuth Connectors

API Key/Token

Username/Password

Adding a Connector

Step-by-Step

Configuration Wizard

Sync Behavior

Initial Sync

Incremental Sync

Webhook Sync (Real-time)

Scheduled Sync

Deletion Detection

Sync Statistics

Sync Modes

When to Use Each Mode

Idempotency Protection

Supported Sync Methods by Connector

Permission Sync

Supported Platforms

Enabling Permission Sync

Permission Modes

Content Filtering

Path/Prefix Filters

File Type Filters

Date Filters

Custom Filters

Best Practices

Security

Performance

Organization

Troubleshooting

Connection Failed

Sync Errors

Missing Content

Next Steps