ClickHouse Connector
Connect to ClickHouse for natural language analytics queries over large datasets. ZenSearch discovers your schema and lets users ask questions in plain English — the AI generates and executes SQL queries automatically.
Overview
The ClickHouse connector is a database connector — it does not ingest documents. Instead, it enables real-time natural language to SQL queries through ZenSearch's AI agent tools, optimized for ClickHouse's columnar analytics engine.
ClickHouse excels at analytical queries over large datasets (billions of rows), making it ideal for questions like "What was our average order value by region over the last 6 months?" or "Which products had the highest return rate this quarter?".
Supported Versions
- ClickHouse 21.x and later
- ClickHouse Cloud
Prerequisites
- ClickHouse server accessible from ZenSearch
- A database user with read-only access
- Network connectivity (firewall rules allowing the appropriate port)
Configuration Reference
| Setting | Type | Required | Description |
|---|---|---|---|
| Host | string | Yes | ClickHouse hostname or IP address |
| Port | integer | No | Server port (default: 9000 for native, 8123 for HTTP) |
| Database | string | No | Database name (default: default) |
| Username | string | Yes | Database user |
| Password | string | No | User password (empty for default user in dev setups) |
| Secure | boolean | No | Enable TLS encryption |
| Protocol | string | No | Connection protocol: native or http |
| Include Tables | array | No | Specific tables to include (empty = all tables) |
| Exclude Tables | array | No | Tables to exclude from schema discovery |
| Include Views | boolean | No | Include views and materialized views |
| Max Rows Per Query | integer | No | Maximum rows returned per query (0–10,000) |
| Query Timeout | integer | No | Query timeout in seconds (0–300) |
| Sample Row Count | integer | No | Number of sample rows for schema understanding |
| Refresh Interval | string | No | How often to refresh the schema cache |
Protocol Selection
| Protocol | Port | Best For |
|---|---|---|
native | 9000 (9440 with TLS) | Best performance, recommended for most deployments |
http | 8123 (8443 with TLS) | Easier to proxy, works through HTTP load balancers |
Setup Steps
- Add Connector: Navigate to Knowledge → Add Data Source → ClickHouse
- Enter Connection Details: Host, port, and database name
- Select Protocol: Choose
native(recommended) orhttp - Provide Credentials: Username and password
- Enable TLS (optional): Turn on
securefor encrypted connections - Filter Tables (optional): Specify which tables to include or exclude
- Test & Create: Verify the connection and save
How Natural Language Queries Work
Once connected, users can ask analytical questions like:
- "What were our top 10 products by revenue last month?"
- "Show me the daily active user trend for the past 90 days"
- "Compare conversion rates across marketing channels"
The AI agent understands ClickHouse-specific SQL syntax including aggregate functions, date functions, and array operations. If a query fails, the error is fed back to the AI for automatic correction.
Query Safety
All generated queries are validated before execution:
- Blocked operations:
DROP,DELETE,INSERT,ALTER,CREATE,GRANT,TRUNCATE - Read-only enforcement: Only
SELECTqueries are permitted - Row limits: Configurable maximum rows prevent large result sets
- Timeout protection: Queries are terminated if they exceed the configured timeout
Creating a Read-Only User
-- Create a dedicated read-only user
CREATE USER zensearch IDENTIFIED BY 'your_secure_password';
-- Grant read access to a specific database
GRANT SELECT ON your_database.* TO zensearch;
-- Or grant access to all databases
GRANT SELECT ON *.* TO zensearch;
Best Practices
- Use the native protocol — Native protocol offers better performance than HTTP for most use cases
- Create a read-only user — Never connect with admin credentials
- Enable TLS for production — Set
secure: truefor encrypted connections - Set query timeouts — ClickHouse can process very large scans; configure
query_timeout_secondsto prevent long-running queries - Limit row counts — Set
max_rows_per_queryto a reasonable value (e.g., 1000) to prevent overwhelming result sets - Add table comments — ClickHouse supports
COMMENTon tables and columns, which helps the AI understand your schema:ALTER TABLE events COMMENT 'User interaction events tracked by the analytics pipeline';
ALTER TABLE events MODIFY COLUMN event_type String COMMENT 'Event type: page_view, click, purchase, signup'; - Exclude system tables — Use
exclude_tablesto filter out ClickHouse system tables if they appear in discovery
Troubleshooting
Connection failed
- Verify the host and port are correct
- Check that the selected protocol matches the server configuration
- For native protocol, default port is 9000 (9440 with TLS); for HTTP, it's 8123 (8443 with TLS)
Timeout on large queries
- ClickHouse may scan billions of rows; set an appropriate
query_timeout_seconds - Consider adding partition keys to frequently queried tables
- Try rephrasing the question to limit the time range
Memory limit exceeded
- The query may require optimization or a narrower scope
- Set
max_rows_per_queryto limit result size - ClickHouse's
max_memory_usageserver setting may need adjustment
Authentication error
- Verify username and password
- Check that the user exists and has
SELECTpermissions - For ClickHouse Cloud, ensure the connection uses TLS (
secure: true)