Files & Knowledge

Batch Upload API

Create upload sessions for 1 to 100 files, upload bytes directly to storage, and finalize the batch into durable Calypso knowledge.

When to use it

Use the Batch Upload API when an integration needs to ingest many files from one job. A batch must include a shared bucket destination, or every item must provide its own destination.

Good fits:

  • content migrations
  • nightly sync jobs
  • benchmark corpora
  • generated documentation bundles
  • bucket seeding scripts

Endpoints

Create the batch upload session:

POST /v1/knowledge/files:batch/upload-session

Finalize after uploading accepted items:

POST /v1/knowledge/files:batch/upload-session/{batch_id}/finalize

Each batch accepts 1 to 100 files.

Create request

The manifest defines the batch and durable item ids. The files array supplies compact per-file upload metadata so Calypso can create one storage upload target per accepted item.

{
  "manifest": {
    "version": 1,
    "batch_idempotency_key": "local-batch-001",
    "bucket": "rag1",
    "create_missing_buckets": true,
    "items": [
      {
        "client_file_id": "contract_pdf",
        "filename": "contract.pdf",
        "title": "Customer Contract"
      }
    ]
  },
  "files": [
    {
      "client_file_id": "contract_pdf",
      "filename": "contract.pdf",
      "content_type": "application/pdf",
      "size_bytes": 184233
    }
  ]
}

Every client_file_id must:

  • be unique within the batch
  • appear once in manifest.items
  • appear once in files
  • use only letters, numbers, _, ., or -
  • not start with reserved Firestore-style prefixes such as __

Shared and per-item bucket assignment

Put bucket fields at the top level when every file should share the same destination:

  • bucket_ids
  • bucket_slugs
  • bucket
  • create_missing_buckets

Put those fields on an item when one file needs different routing. If there is no top-level bucket destination, every item must include one of bucket_ids, bucket_slugs, or bucket.

Requests without a shared bucket and with any unbucketed item return 400 bucket_required.

Create response

Calypso returns a batch id and one upload session per accepted item:

{
  "batch_id": "batch_123",
  "upload_strategy": "gcs_resumable",
  "accepted": [
    {
      "client_file_id": "contract_pdf",
      "session_id": "sess_123",
      "upload_url": "https://storage.googleapis.com/...",
      "expires_at": "2026-06-08T21:00:00Z"
    }
  ],
  "rejected": [],
  "request_id": "req_current"
}

Upload every accepted file directly to its upload_url. Treat each URL as a short-lived bearer capability.

Finalize

curl -X POST "https://api.calypso.so/v1/knowledge/files:batch/upload-session/$BATCH_ID/finalize" \
  -H "Authorization: Bearer sk_..." \
  -H "Content-Type: application/json" \
  -d '{"mode":"finalize_uploaded"}'

Finalize validates uploaded blobs item by item. Missing uploads stay pending. Already-finalized items are replayed, so retrying finalize is safe.

Typical finalize response:

{
  "batch_id": "batch_123",
  "status": "finalized",
  "finalized": [
    { "client_file_id": "contract_pdf", "session_id": "sess_123", "knowledge_id": "file_123", "task_id": "task_123" }
  ],
  "pending": [],
  "failed": [],
  "replayed": [],
  "request_id": "req_current"
}

Batch status

Poll:

GET /v1/knowledge/batches/{batch_id}?include_items=true

Batch statuses:

StatusMeaning
acceptedFiles were stored and queued, but not all accepted items are terminal.
indexingAt least one item is actively indexing.
activeAll accepted items are indexed.
partially_activeSome files are active while others are queued, indexing, or failed.
partially_failedAt least one item failed and not all accepted files failed.
failedAll accepted files failed, or no files were accepted.

Item statuses give the most useful debugging signal. Inspect item-level errors, knowledgeId, taskId, bucketSyncStatus, and bucketSync.

Query readiness

202 Accepted from finalize means the batch was durably accepted. It does not mean the files are query-ready.

Wait for items to become active or indexed.

For bucket-scoped retrieval, also wait for bucket sync to become active.

Reliability rules

  • Keep batch size at or below 100 files.
  • Respect the 5 create requests per second per team rate limit.
  • Use stable batch_idempotency_key values for retries.
  • Generate deterministic client_file_id values when retrying the same local files.
  • Include a shared bucket destination, or include bucket fields on every item.

Next: