Skip to main content

Chat API

Have conversations with AI using your indexed documents as context.

info

All chat endpoints are team-scoped. You must include the teamId in the URL path. The team ID is available from your team settings.

Send Message

Endpoint

POST /api/v1/teams/{teamId}/chat

Request Body

{
"message": "What is our refund policy?",
"conversationId": "conv_abc123",
"collections": ["col_xyz789"]
}

Parameters

ParameterTypeRequiredDescription
messagestringYesUser message
conversationIdstringNoContinue existing conversation
collectionsstring[]NoCollections to search
modelstringNoLLM model to use
agentIdstringNoUse specific agent
maxTokensintegerNoMaximum tokens for response (default: model-specific)

Response

{
"data": {
"id": "msg_abc123",
"conversationId": "conv_xyz789",
"content": "Our refund policy allows returns within 30 days...",
"sources": [
{
"id": "doc_123",
"title": "Refund Policy",
"excerpt": "Returns are accepted within 30 days...",
"url": "https://..."
}
],
"answer_confidence": {
"score": 0.87,
"label": "high"
},
"usage": {
"promptTokens": 150,
"completionTokens": 200,
"totalTokens": 350
}
},
"meta": {
"requestId": "req_abc123"
}
}

Streaming Chat

Endpoint

POST /api/v1/teams/{teamId}/chat/stream

Uses Server-Sent Events (SSE) for streaming responses.

Request

Same as POST /api/v1/teams/{teamId}/chat

Response Stream

event: message_start
data: {"id": "msg_abc123", "conversationId": "conv_xyz789"}

event: content_delta
data: {"delta": "Our refund policy "}

event: content_delta
data: {"delta": "allows returns within "}

event: content_delta
data: {"delta": "30 days..."}

event: sources
data: {"sources": [{"id": "doc_123", ...}]}

event: message_end
data: {"usage": {"totalTokens": 350}, "answer_confidence": {"score": 0.87, "label": "high"}}

JavaScript Streaming

const stream = await client.chat.stream({
message: 'What is our refund policy?'
});

for await (const event of stream) {
if (event.type === 'content_delta') {
process.stdout.write(event.delta);
}
}

With Agent

Enable Agent Mode

{
"message": "Compare Q3 and Q4 sales by region",
"agentId": "agent_abc123"
}

Agent Events (Streaming)

event: agent_start
data: {"iteration": 1}

event: tool_call
data: {"tool": "search_documents", "params": {"query": "Q3 sales"}}

event: tool_result
data: {"tool": "search_documents", "result": {"count": 5}}

event: agent_thinking
data: {"thought": "Analyzing Q3 data..."}

event: content_delta
data: {"delta": "Based on my analysis..."}

Conversation History

Get Conversation

GET /api/v1/teams/{teamId}/conversations/{id}

List Conversations

GET /api/v1/teams/{teamId}/conversations

Delete Conversation

DELETE /api/v1/teams/{teamId}/conversations/{id}

Example Usage

cURL

curl -X POST https://your-domain.com/api/v1/teams/{teamId}/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"message": "What is our refund policy?"
}'

Python Streaming

async for event in client.chat.stream(
message="Summarize the Q4 report"
):
if event.type == "content_delta":
print(event.delta, end="", flush=True)

Answer Confidence

Each response includes an answer_confidence field that provides a composite quality assessment:

{
"answer_confidence": {
"score": 0.87,
"label": "high"
}
}
FieldTypeDescription
scorenumberComposite confidence score from 0.0 to 1.0
labelstringHuman-readable label: "high" (≥ 0.75), "medium" (≥ 0.45), or "low" (< 0.45)

The score is computed from multiple quality signals:

SignalWeightDescription
Coverage ratio50%How well the response covers retrieved source material
Shape confidence20%How well the query type was classified
Coverage completeness15%Whether all source groups are represented
Violation-free15%Whether the response passed all quality checks

Context Management

Context Window Limits

Each model has a maximum context window that includes:

  • System prompt and instructions
  • Retrieved source documents
  • Conversation history
  • Your current message
ModelContext WindowRecommended Max
gpt-4o128,000 tokens100,000 tokens
gpt-4-turbo128,000 tokens100,000 tokens
claude-3.5-sonnet200,000 tokens160,000 tokens
claude-3-opus200,000 tokens160,000 tokens
command-r-plus128,000 tokens100,000 tokens

Automatic Context Management

ZenSearch automatically manages context to stay within model limits:

  1. Source Priority: Retrieved documents are prioritized (they contain the answers)
  2. History Truncation: Older conversation messages are dropped when needed
  3. Smart Allocation: Different query types get optimized context distribution

Long Conversations

For conversations that exceed context limits:

  • Earlier messages are automatically removed
  • Most recent context is preserved
  • Sources are prioritized over history
  • Consider starting a new conversation for fresh context

Context in Response

Token usage is included in the response:

{
"usage": {
"promptTokens": 1500,
"completionTokens": 200,
"totalTokens": 1700
}
}

Monitor promptTokens to understand context utilization.

Error Responses

CodeDescription
400Invalid request
401Authentication required
403Insufficient permissions
429Rate limit exceeded

Next Steps