Skip to main content

Key Concepts

Understanding these core concepts will help you get the most out of ZenSearch.

Data Sources & Connectors

Connectors

A connector is a configured connection to an external data source. ZenSearch supports 17+ connector types:

  • Cloud Storage: S3, Google Drive, SharePoint, Azure Blob
  • Collaboration Tools: Confluence, Notion, Slack
  • Development Tools: GitHub, Jira
  • CRM Systems: Salesforce, HubSpot, SAP
  • Databases: PostgreSQL, MySQL, ClickHouse, MS SQL
  • Web: Web Crawler

Each connector:

  • Authenticates with your data source
  • Syncs content on a schedule or via webhooks
  • Maintains permissions from the source platform

Collections

A collection is a logical grouping of documents from one or more connectors. Collections help you:

  • Organize content by topic, department, or project
  • Control which content is searched
  • Apply different embedding models
  • Manage access permissions

Example setup:

Engineering Collection
├── GitHub (code repositories)
├── Confluence (technical docs)
└── Jira (tickets and issues)

Sales Collection
├── Salesforce (CRM data)
├── Google Drive (presentations)
└── HubSpot (marketing content)

Documents & Semantic Units

Documents

A document represents a single piece of content from a data source - a file, page, message, or record. Documents are:

  • Parsed to extract text and metadata
  • Classified by type and content
  • Indexed for retrieval

Semantic Units (SUs)

ZenSearch breaks documents into Semantic Units - meaningful chunks of content optimized for AI retrieval. This process:

  1. Segments content into logical sections
  2. Preserves context and relationships
  3. Generates embeddings for semantic search
  4. Maintains links to source documents

Search & Retrieval

ZenSearch uses hybrid search combining:

  • Dense embeddings: Semantic understanding of meaning
  • Sparse embeddings: Keyword matching for precision
  • Fusion algorithms: Combining results for best accuracy

Search Modes

ModeDescriptionBest For
ChatConversational AI with streaming responsesQuestions, research, exploration
SearchTraditional search results with faceted filteringFinding specific documents

Filter results by:

  • Topics/Categories: Auto-extracted document topics
  • Departments: Organizational categories
  • Languages: Document language
  • Date Ranges: When content was created/modified
  • Sentiment: Positive, neutral, or negative content

AI Agents

What are Agents?

Agents are AI-powered assistants that can:

  • Execute multi-step research tasks
  • Use tools to search, query, and analyze
  • Maintain conversation context
  • Provide comprehensive answers

Agent Tools

Built-in tools available to agents:

ToolDescription
search_documentsSearch across collections
get_documentRetrieve full document content
summarize_documentGenerate document summaries
search_database_schemaDiscover database structure
query_databaseExecute read-only SQL queries
get_table_infoGet table columns and types
search_knowledge_graphFind entity relationships
calculatePerform calculations

Agent Modes

  • Auto: Automatically uses agent for complex queries
  • Research: Always uses agent with planning
  • Off: Direct chat without agent capabilities

Permissions & Access Control

Team Roles

RoleCapabilities
OwnerFull control, delete team, transfer ownership
AdminManage members, connectors, collections
EditorCreate/edit connectors, run sync jobs
ViewerRead-only, search and chat

Document-Level Permissions

ZenSearch syncs permissions from source platforms:

  • User permissions: Individual access rights
  • Group permissions: Team or group access
  • Domain permissions: Organization-wide access
  • Public access: Anyone can view

Permissions are enforced at search time - users only see content they're authorized to access.

Processing Pipeline

When you connect a data source, content flows through:

Collection → Parsing → Structure Analysis → Projection → Vectorization → Classification
  1. Collection: Fetches content from source
  2. Parsing: Extracts text and metadata
  3. Structure Analysis: Identifies document structure
  4. Projection: Creates semantic units
  5. Vectorization: Generates embeddings
  6. Classification: Categorizes content

Guardrails & Safety

ZenSearch includes built-in safety features:

Input Guardrails

  • Content moderation
  • Prompt injection detection
  • PII detection
  • Length validation

Output Guardrails

  • Hallucination detection
  • Toxicity filtering
  • Relevance checking

Next Steps

Now that you understand the key concepts: