Skip to main content

ClickHouse Connector

Connect to ClickHouse for natural language analytics queries over large datasets. ZenSearch discovers your schema and lets users ask questions in plain English — the AI generates and executes SQL queries automatically.

Overview

The ClickHouse connector is a database connector — it does not ingest documents. Instead, it enables real-time natural language to SQL queries through ZenSearch's AI agent tools, optimized for ClickHouse's columnar analytics engine.

ClickHouse excels at analytical queries over large datasets (billions of rows), making it ideal for questions like "What was our average order value by region over the last 6 months?" or "Which products had the highest return rate this quarter?".

Supported Versions

  • ClickHouse 21.x and later
  • ClickHouse Cloud

Prerequisites

  • ClickHouse server accessible from ZenSearch
  • A database user with read-only access
  • Network connectivity (firewall rules allowing the appropriate port)

Configuration Reference

SettingTypeRequiredDescription
HoststringYesClickHouse hostname or IP address
PortintegerNoServer port (default: 9000 for native, 8123 for HTTP)
DatabasestringNoDatabase name (default: default)
UsernamestringYesDatabase user
PasswordstringNoUser password (empty for default user in dev setups)
SecurebooleanNoEnable TLS encryption
ProtocolstringNoConnection protocol: native or http
Include TablesarrayNoSpecific tables to include (empty = all tables)
Exclude TablesarrayNoTables to exclude from schema discovery
Include ViewsbooleanNoInclude views and materialized views
Max Rows Per QueryintegerNoMaximum rows returned per query (0–10,000)
Query TimeoutintegerNoQuery timeout in seconds (0–300)
Sample Row CountintegerNoNumber of sample rows for schema understanding
Refresh IntervalstringNoHow often to refresh the schema cache

Protocol Selection

ProtocolPortBest For
native9000 (9440 with TLS)Best performance, recommended for most deployments
http8123 (8443 with TLS)Easier to proxy, works through HTTP load balancers

Setup Steps

  1. Add Connector: Navigate to Knowledge → Add Data Source → ClickHouse
  2. Enter Connection Details: Host, port, and database name
  3. Select Protocol: Choose native (recommended) or http
  4. Provide Credentials: Username and password
  5. Enable TLS (optional): Turn on secure for encrypted connections
  6. Filter Tables (optional): Specify which tables to include or exclude
  7. Test & Create: Verify the connection and save

How Natural Language Queries Work

Once connected, users can ask analytical questions like:

  • "What were our top 10 products by revenue last month?"
  • "Show me the daily active user trend for the past 90 days"
  • "Compare conversion rates across marketing channels"

The AI agent understands ClickHouse-specific SQL syntax including aggregate functions, date functions, and array operations. If a query fails, the error is fed back to the AI for automatic correction.

Query Safety

All generated queries are validated before execution:

  • Blocked operations: DROP, DELETE, INSERT, ALTER, CREATE, GRANT, TRUNCATE
  • Read-only enforcement: Only SELECT queries are permitted
  • Row limits: Configurable maximum rows prevent large result sets
  • Timeout protection: Queries are terminated if they exceed the configured timeout

Creating a Read-Only User

-- Create a dedicated read-only user
CREATE USER zensearch IDENTIFIED BY 'your_secure_password';

-- Grant read access to a specific database
GRANT SELECT ON your_database.* TO zensearch;

-- Or grant access to all databases
GRANT SELECT ON *.* TO zensearch;

Best Practices

  1. Use the native protocol — Native protocol offers better performance than HTTP for most use cases
  2. Create a read-only user — Never connect with admin credentials
  3. Enable TLS for production — Set secure: true for encrypted connections
  4. Set query timeouts — ClickHouse can process very large scans; configure query_timeout_seconds to prevent long-running queries
  5. Limit row counts — Set max_rows_per_query to a reasonable value (e.g., 1000) to prevent overwhelming result sets
  6. Add table comments — ClickHouse supports COMMENT on tables and columns, which helps the AI understand your schema:
    ALTER TABLE events COMMENT 'User interaction events tracked by the analytics pipeline';
    ALTER TABLE events MODIFY COLUMN event_type String COMMENT 'Event type: page_view, click, purchase, signup';
  7. Exclude system tables — Use exclude_tables to filter out ClickHouse system tables if they appear in discovery

Troubleshooting

Connection failed

  • Verify the host and port are correct
  • Check that the selected protocol matches the server configuration
  • For native protocol, default port is 9000 (9440 with TLS); for HTTP, it's 8123 (8443 with TLS)

Timeout on large queries

  • ClickHouse may scan billions of rows; set an appropriate query_timeout_seconds
  • Consider adding partition keys to frequently queried tables
  • Try rephrasing the question to limit the time range

Memory limit exceeded

  • The query may require optimization or a narrower scope
  • Set max_rows_per_query to limit result size
  • ClickHouse's max_memory_usage server setting may need adjustment

Authentication error

  • Verify username and password
  • Check that the user exists and has SELECT permissions
  • For ClickHouse Cloud, ensure the connection uses TLS (secure: true)