Azure Blob Storage Connector
Connect to Azure Blob Storage to index documents from your containers. ZenSearch parses files, extracts text, and makes them searchable through AI-powered search.
Overview
The Azure Blob connector allows you to:
- Index documents from blob containers (PDF, DOCX, XLSX, TXT, code files, and more)
- Filter by prefix/path for targeted indexing
- Support multiple authentication methods (connection string, SAS, account key, managed identity)
- Include blob metadata as searchable document properties
- Process images with OCR for text extraction
Prerequisites
- Azure Storage account
- A container with documents to index
- Access credentials (connection string, account key, SAS token, or managed identity)
Authentication
Connection String (Recommended for Getting Started)
- Go to Azure Portal → Storage Account → Access keys
- Copy the Connection string (either key1 or key2)
- Enter it in ZenSearch
Account Key
- Go to Azure Portal → Storage Account → Access keys
- Copy the Account Key (key1 or key2)
- Enter the account name and key in ZenSearch
SAS Token
Shared Access Signature tokens provide time-limited, scoped access:
- Go to Azure Portal → Storage Account → Shared access signature
- Configure permissions:
- Allowed services: Blob
- Allowed resource types: Container, Object
- Allowed permissions: Read, List
- Set an appropriate expiry date
- Generate and copy the SAS token
Managed Identity (Recommended for Azure-Hosted Deployments)
For ZenSearch deployments running on Azure (VMs, App Service, AKS), managed identity provides passwordless authentication:
- Enable Managed Identity on your Azure resource (VM, App Service, Container Instance, etc.)
- Assign the Storage Blob Data Reader role to the identity on the target storage account:
az role assignment create \
--assignee <managed-identity-object-id> \
--role "Storage Blob Data Reader" \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account> - Set
auth_methodtomanaged_identityin ZenSearch — no credentials required
Configuration Reference
| Setting | Type | Required | Description |
|---|---|---|---|
| Storage Account | string | Yes | Azure Storage account name |
| Container Name | string | Yes | Blob container name |
| Auth Method | string | Yes | Authentication method: connection_string, account_key, sas_token, or managed_identity |
| Connection String | string | No* | Full connection string (*required for connection_string auth) |
| Account Key | string | No* | Account key (*required for account_key auth) |
| SAS Token | string | No* | SAS token (*required for sas_token auth) |
| Prefix | string | No | Path prefix filter (e.g., documents/2025/) |
| Endpoint Suffix | string | No | Custom endpoint suffix (default: blob.core.windows.net). Use core.chinacloudapi.cn for Azure China. |
| Include Images | boolean | No | Process images with OCR for text extraction |
| Include Metadata | boolean | No | Include blob metadata as document properties |
Setup Steps
- Add Connector: Navigate to Knowledge → Add Data Source → Azure Blob Storage
- Enter Account Details: Storage account name and container name
- Select Auth Method: Choose your authentication approach
- Provide Credentials: Enter the required credentials for your auth method
- Set Prefix (optional): Filter to a specific path within the container
- Test & Create: Verify the connection and save
Supported File Types
The Azure Blob connector processes the same file types as the S3 connector:
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, ODT, ODS, ODP |
| Text | TXT, MD, RST, CSV, TSV, LOG |
| Code | All major languages (Python, Go, JavaScript, TypeScript, Java, etc.) |
| Markup | HTML, XML, JSON, YAML, TOML |
| Images | PNG, JPG, TIFF (when OCR is enabled via include_images) |
Maximum file size: 500 MB per file.
Best Practices
- Use SAS tokens with minimal permissions — Grant only
ReadandListpermissions, scoped to the specific container - Set reasonable expiry dates — SAS tokens should expire and be rotated regularly
- Use managed identity in production — Eliminates credential management entirely for Azure-hosted deployments
- Filter by prefix for large containers — If your container has thousands of blobs, use
prefixto target specific directories - Enable metadata — Blob metadata (custom key-value pairs set on blobs) can provide additional context for search results
- Organize blobs by topic — Create separate ZenSearch collections for different blob prefixes (e.g.,
legal/,engineering/,hr/)
Sovereign Cloud Support
For Azure sovereign clouds, set the endpoint_suffix to the appropriate value:
| Cloud | Endpoint Suffix |
|---|---|
| Azure Global | blob.core.windows.net (default) |
| Azure China | blob.core.chinacloudapi.cn |
| Azure Government | blob.core.usgovcloudapi.net |
| Azure Germany | blob.core.cloudapi.de |
Troubleshooting
Authentication failed
- Connection string: Verify the full connection string is copied correctly (no trailing whitespace)
- Account key: Ensure both the account name and key are provided
- SAS token: Check the token has not expired and includes
ReadandListpermissions - Managed identity: Verify the role assignment is on the correct storage account and the identity has propagated (may take a few minutes)
Container not found
- Verify the container name is spelled correctly (case-sensitive)
- Check that the container exists in the specified storage account
- Ensure the credentials have access to the container
Files not indexed
- Check that the file types are supported (see table above)
- Verify the
prefixfilter is not too restrictive - Files larger than 500 MB are skipped
- Ensure the blobs are not in an archive access tier (Hot or Cool tiers are required)
Slow sync performance
- Large containers with many small files may take time; use
prefixto scope the sync - Network latency between ZenSearch and Azure may affect throughput
- Consider using a storage account in the same region as your ZenSearch deployment