Batch Upload API
When to use it
Use the Batch Upload API when an integration needs to ingest many files from one job. A batch must include a shared bucket destination, or every item must provide its own destination.
Good fits:
- content migrations
- nightly sync jobs
- benchmark corpora
- generated documentation bundles
- bucket seeding scripts
Endpoints
Create the batch upload session:
POST /v1/knowledge/files:batch/upload-session
Finalize after uploading accepted items:
POST /v1/knowledge/files:batch/upload-session/{batch_id}/finalize
Each batch accepts 1 to 100 files.
Create request
The manifest defines the batch and durable item ids. The files array supplies compact per-file upload metadata so Calypso can create one storage upload target per accepted item.
{
"manifest": {
"version": 1,
"batch_idempotency_key": "local-batch-001",
"bucket": "rag1",
"create_missing_buckets": true,
"items": [
{
"client_file_id": "contract_pdf",
"filename": "contract.pdf",
"title": "Customer Contract"
}
]
},
"files": [
{
"client_file_id": "contract_pdf",
"filename": "contract.pdf",
"content_type": "application/pdf",
"size_bytes": 184233
}
]
}
Every client_file_id must:
- be unique within the batch
- appear once in
manifest.items - appear once in
files - use only letters, numbers,
_,., or- - not start with reserved Firestore-style prefixes such as
__
Shared and per-item bucket assignment
Put bucket fields at the top level when every file should share the same destination:
bucket_idsbucket_slugsbucketcreate_missing_buckets
Put those fields on an item when one file needs different routing. If there is no top-level bucket destination, every item must include one of bucket_ids, bucket_slugs, or bucket.
Requests without a shared bucket and with any unbucketed item return 400 bucket_required.
Create response
Calypso returns a batch id and one upload session per accepted item:
{
"batch_id": "batch_123",
"upload_strategy": "gcs_resumable",
"accepted": [
{
"client_file_id": "contract_pdf",
"session_id": "sess_123",
"upload_url": "https://storage.googleapis.com/...",
"expires_at": "2026-06-08T21:00:00Z"
}
],
"rejected": [],
"request_id": "req_current"
}
Upload every accepted file directly to its upload_url. Treat each URL as a short-lived bearer capability.
Finalize
curl -X POST "https://api.calypso.so/v1/knowledge/files:batch/upload-session/$BATCH_ID/finalize" \
-H "Authorization: Bearer sk_..." \
-H "Content-Type: application/json" \
-d '{"mode":"finalize_uploaded"}'
Finalize validates uploaded blobs item by item. Missing uploads stay pending. Already-finalized items are replayed, so retrying finalize is safe.
Typical finalize response:
{
"batch_id": "batch_123",
"status": "finalized",
"finalized": [
{ "client_file_id": "contract_pdf", "session_id": "sess_123", "knowledge_id": "file_123", "task_id": "task_123" }
],
"pending": [],
"failed": [],
"replayed": [],
"request_id": "req_current"
}
Batch status
Poll:
GET /v1/knowledge/batches/{batch_id}?include_items=true
Batch statuses:
| Status | Meaning |
|---|---|
accepted | Files were stored and queued, but not all accepted items are terminal. |
indexing | At least one item is actively indexing. |
active | All accepted items are indexed. |
partially_active | Some files are active while others are queued, indexing, or failed. |
partially_failed | At least one item failed and not all accepted files failed. |
failed | All accepted files failed, or no files were accepted. |
Item statuses give the most useful debugging signal. Inspect item-level errors, knowledgeId, taskId, bucketSyncStatus, and bucketSync.
Query readiness
202 Accepted from finalize means the batch was durably accepted. It does not mean the files are query-ready.
Wait for items to become active or indexed.
For bucket-scoped retrieval, also wait for bucket sync to become active.
Reliability rules
- Keep batch size at or below 100 files.
- Respect the 5 create requests per second per team rate limit.
- Use stable
batch_idempotency_keyvalues for retries. - Generate deterministic
client_file_idvalues when retrying the same local files. - Include a shared bucket destination, or include bucket fields on every item.
Next: