Files & Knowledge

Knowledge Buckets

Understand Calypso knowledge buckets, bucket-backed file uploads, readiness, and the endpoints used to add files to buckets.

What Knowledge Buckets Are

Knowledge buckets are the durable retrieval groups behind Calypso RAG. A bucket holds related files and other knowledge sources so an agent can search the right collection instead of a loose pile of uploads.

Use buckets for collections like:

  • support-handbook
  • pricing
  • legal
  • customer-onboarding
  • rag1

Every durable file upload is bucket-backed. That means upload flows must choose at least one destination bucket before Calypso accepts the file for indexing.

Main concepts

ConceptWhat it meansUser-facing behavior
BucketA named group of retrievable knowledge.Agents and profiles can search a focused collection such as support, legal, pricing, or onboarding.
Bucket destinationThe bucket id or slug selected during upload.Required for single-file uploads, batch uploads, MCP uploads, and manual UI uploads.
Knowledge sourceA stored item such as a PDF, uploaded document, website, Q&A entry, or legacy intent.Appears in Knowledge after creation and moves through readiness states.
Indexing taskThe background process that turns stored content into retrievable content.Upload responses may be queued before answers can use the file.
Bucket syncThe post-indexing step that makes a source searchable through its bucket.Bucket pages only show active members after sync completes.
Agent policyThe saved retrieval and presentation configuration for calypso-rag-agent or a named profile.Determines which buckets the agent searches.

List Buckets With The REST API

Use the public bucket listing endpoint when an integration needs to discover valid bucket ids or verify bucket slugs before upload:

GET /v1/knowledge/buckets
Authorization: Bearer sk_...

The endpoint lists buckets for the team tied to the project API key. It does not accept team_id; Calypso derives team scope from the bearer key.

Optional query:

GET /v1/knowledge/buckets?include_archived=true

Typical response:

{
  "team_id": "team_123",
  "buckets": [
    {
      "id": "bucket_abc",
      "teamId": "team_123",
      "slug": "support-handbook",
      "name": "Support Handbook",
      "status": "active",
      "knowledgeIds": ["file_123"],
      "memberCount": 1,
      "counts": {
        "total": 1,
        "file": 1,
        "intents": 0,
        "website": 0,
        "qa": 0,
        "retrievable": 1
      },
      "bucketStore": {
        "status": "active",
        "member_count": 1,
        "indexed_member_count": 1,
        "pending_member_count": 0
      }
    }
  ],
  "request_id": "req_current"
}

Use returned id values as upload bucket_ids, or returned slug values with bucket_slugs and bucket. Keep include_archived=false for normal upload flows so users do not route new files into archived destinations.

How buckets connect to agents

calypso-rag-agent is the default hosted RAG agent. It does not need file contents passed inline on every request. Instead, it resolves the saved retrieval policy at runtime and searches the buckets attached to that policy.

That policy can point at:

  • Selected buckets for grouped source sets.
  • Named profiles such as calypso-rag-agent:{profile_id}, where each profile can have its own bucket scope.
  • Uploaded agent files, which are now also backed by one selected bucket.
flowchart LR
  uploadPath["Upload path"] --> bucketDestination["Required bucket destination"]
  bucketDestination --> indexing["Indexing task"]
  indexing --> indexedSource["Indexed source"]
  indexedSource --> bucketStore["Bucket store"]
  bucketStore --> agentPolicy
  agentPolicy --> ragAgent["calypso-rag-agent"]

Bucket API endpoints

EndpointBest forBucket fieldsReadiness signalDetails
GET /v1/knowledge/bucketsDiscovering valid bucket ids and slugs before upload.None; team scope comes from the API key.Returns bucket status, counts, and bucket-store state.This page
POST /v1/knowledge/files/upload-sessionCreate a direct-to-storage session for one file.bucket_ids, bucket_slugs, or bucket; create_missing_buckets for slug provisioning.Finalize, then poll GET /v1/knowledge/files/{file_id} or task status.Single-file Upload API
POST /v1/knowledge/files:batch/upload-sessionCreate direct-to-storage sessions for 1 to 100 files.Shared bucket fields at manifest root, or bucket fields on every item.Finalize, then poll GET /v1/knowledge/batches/{batch_id}?include_items=true.Batch Upload API

If a public API upload does not include a bucket destination, Calypso rejects it with bucket_required. MCP upload details live in the MCP integration page.

Readiness model

Uploads have multiple readiness layers:

  1. Accepted: Calypso received a valid request and stored durable upload state.
  2. Queued: indexing work exists but has not completed.
  3. Indexing / processing: the provider is ingesting the source.
  4. Active / indexed: the source has canonical indexed content.
  5. Bucket active: the source can be retrieved through its selected bucket.

For reliable tests, do not evaluate answers immediately after upload. Poll the file, task, or batch status until bucketSyncStatus is active for the relevant bucket.

Choosing the right upload path

Use the UI when a person is curating source quality. Use the single-file Upload API when an integration emits one document at a time. Use the Batch Upload API when a job emits many files and needs idempotency and per-item status. Use the MCP integration when an AI client or local agent should trigger the ingestion flow directly.

Choose bucket ids when the integration already knows the stable destination. Choose bucket slugs when the integration is managed by humans or deployment scripts. Use create_missing_buckets=true only when the upload job is allowed to provision missing slug-based buckets.

Next: