Chat API
Have conversations with AI using your indexed documents as context.
info
All chat endpoints are team-scoped. You must include the teamId in the URL path. The team ID is available from your team settings.
Send Message
Endpoint
POST /api/v1/teams/{teamId}/chat
Request Body
{
"message": "What is our refund policy?",
"conversationId": "conv_abc123",
"collections": ["col_xyz789"]
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
message | string | Yes | User message |
conversationId | string | No | Continue existing conversation |
collections | string[] | No | Collections to search |
model | string | No | LLM model to use |
agentId | string | No | Use specific agent |
maxTokens | integer | No | Maximum tokens for response (default: model-specific) |
Response
{
"data": {
"id": "msg_abc123",
"conversationId": "conv_xyz789",
"content": "Our refund policy allows returns within 30 days...",
"sources": [
{
"id": "doc_123",
"title": "Refund Policy",
"excerpt": "Returns are accepted within 30 days...",
"url": "https://..."
}
],
"answer_confidence": {
"score": 0.87,
"label": "high"
},
"usage": {
"promptTokens": 150,
"completionTokens": 200,
"totalTokens": 350
}
},
"meta": {
"requestId": "req_abc123"
}
}
Streaming Chat
Endpoint
POST /api/v1/teams/{teamId}/chat/stream
Uses Server-Sent Events (SSE) for streaming responses.
Request
Same as POST /api/v1/teams/{teamId}/chat
Response Stream
event: message_start
data: {"id": "msg_abc123", "conversationId": "conv_xyz789"}
event: content_delta
data: {"delta": "Our refund policy "}
event: content_delta
data: {"delta": "allows returns within "}
event: content_delta
data: {"delta": "30 days..."}
event: sources
data: {"sources": [{"id": "doc_123", ...}]}
event: message_end
data: {"usage": {"totalTokens": 350}, "answer_confidence": {"score": 0.87, "label": "high"}}
JavaScript Streaming
const stream = await client.chat.stream({
message: 'What is our refund policy?'
});
for await (const event of stream) {
if (event.type === 'content_delta') {
process.stdout.write(event.delta);
}
}
With Agent
Enable Agent Mode
{
"message": "Compare Q3 and Q4 sales by region",
"agentId": "agent_abc123"
}
Agent Events (Streaming)
event: agent_start
data: {"iteration": 1}
event: tool_call
data: {"tool": "search_documents", "params": {"query": "Q3 sales"}}
event: tool_result
data: {"tool": "search_documents", "result": {"count": 5}}
event: agent_thinking
data: {"thought": "Analyzing Q3 data..."}
event: content_delta
data: {"delta": "Based on my analysis..."}
Conversation History
Get Conversation
GET /api/v1/teams/{teamId}/conversations/{id}
List Conversations
GET /api/v1/teams/{teamId}/conversations
Delete Conversation
DELETE /api/v1/teams/{teamId}/conversations/{id}
Example Usage
cURL
curl -X POST https://your-domain.com/api/v1/teams/{teamId}/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"message": "What is our refund policy?"
}'
Python Streaming
async for event in client.chat.stream(
message="Summarize the Q4 report"
):
if event.type == "content_delta":
print(event.delta, end="", flush=True)
Answer Confidence
Each response includes an answer_confidence field that provides a composite quality assessment:
{
"answer_confidence": {
"score": 0.87,
"label": "high"
}
}
| Field | Type | Description |
|---|---|---|
score | number | Composite confidence score from 0.0 to 1.0 |
label | string | Human-readable label: "high" (≥ 0.75), "medium" (≥ 0.45), or "low" (< 0.45) |
The score is computed from multiple quality signals:
| Signal | Weight | Description |
|---|---|---|
| Coverage ratio | 50% | How well the response covers retrieved source material |
| Shape confidence | 20% | How well the query type was classified |
| Coverage completeness | 15% | Whether all source groups are represented |
| Violation-free | 15% | Whether the response passed all quality checks |
Context Management
Context Window Limits
Each model has a maximum context window that includes:
- System prompt and instructions
- Retrieved source documents
- Conversation history
- Your current message
| Model | Context Window | Recommended Max |
|---|---|---|
| gpt-4o | 128,000 tokens | 100,000 tokens |
| gpt-4-turbo | 128,000 tokens | 100,000 tokens |
| claude-3.5-sonnet | 200,000 tokens | 160,000 tokens |
| claude-3-opus | 200,000 tokens | 160,000 tokens |
| command-r-plus | 128,000 tokens | 100,000 tokens |
Automatic Context Management
ZenSearch automatically manages context to stay within model limits:
- Source Priority: Retrieved documents are prioritized (they contain the answers)
- History Truncation: Older conversation messages are dropped when needed
- Smart Allocation: Different query types get optimized context distribution
Long Conversations
For conversations that exceed context limits:
- Earlier messages are automatically removed
- Most recent context is preserved
- Sources are prioritized over history
- Consider starting a new conversation for fresh context
Context in Response
Token usage is included in the response:
{
"usage": {
"promptTokens": 1500,
"completionTokens": 200,
"totalTokens": 1700
}
}
Monitor promptTokens to understand context utilization.
Error Responses
| Code | Description |
|---|---|
| 400 | Invalid request |
| 401 | Authentication required |
| 403 | Insufficient permissions |
| 429 | Rate limit exceeded |
Next Steps
- Agents API - Agent management
- Search API - Direct search