Knowledge Buckets
What Knowledge Buckets Are
Knowledge buckets are the durable retrieval groups behind Calypso RAG. A bucket holds related files and other knowledge sources so an agent can search the right collection instead of a loose pile of uploads.
Use buckets for collections like:
support-handbookpricinglegalcustomer-onboardingrag1
Every durable file upload is bucket-backed. That means upload flows must choose at least one destination bucket before Calypso accepts the file for indexing.
Main concepts
| Concept | What it means | User-facing behavior |
|---|---|---|
| Bucket | A named group of retrievable knowledge. | Agents and profiles can search a focused collection such as support, legal, pricing, or onboarding. |
| Bucket destination | The bucket id or slug selected during upload. | Required for single-file uploads, batch uploads, MCP uploads, and manual UI uploads. |
| Knowledge source | A stored item such as a PDF, uploaded document, website, Q&A entry, or legacy intent. | Appears in Knowledge after creation and moves through readiness states. |
| Indexing task | The background process that turns stored content into retrievable content. | Upload responses may be queued before answers can use the file. |
| Bucket sync | The post-indexing step that makes a source searchable through its bucket. | Bucket pages only show active members after sync completes. |
| Agent policy | The saved retrieval and presentation configuration for calypso-rag-agent or a named profile. | Determines which buckets the agent searches. |
List Buckets With The REST API
Use the public bucket listing endpoint when an integration needs to discover valid bucket ids or verify bucket slugs before upload:
GET /v1/knowledge/buckets
Authorization: Bearer sk_...
The endpoint lists buckets for the team tied to the project API key. It does not accept team_id; Calypso derives team scope from the bearer key.
Optional query:
GET /v1/knowledge/buckets?include_archived=true
Typical response:
{
"team_id": "team_123",
"buckets": [
{
"id": "bucket_abc",
"teamId": "team_123",
"slug": "support-handbook",
"name": "Support Handbook",
"status": "active",
"knowledgeIds": ["file_123"],
"memberCount": 1,
"counts": {
"total": 1,
"file": 1,
"intents": 0,
"website": 0,
"qa": 0,
"retrievable": 1
},
"bucketStore": {
"status": "active",
"member_count": 1,
"indexed_member_count": 1,
"pending_member_count": 0
}
}
],
"request_id": "req_current"
}
Use returned id values as upload bucket_ids, or returned slug values with bucket_slugs and bucket. Keep include_archived=false for normal upload flows so users do not route new files into archived destinations.
How buckets connect to agents
calypso-rag-agent is the default hosted RAG agent. It does not need file contents passed inline on every request. Instead, it resolves the saved retrieval policy at runtime and searches the buckets attached to that policy.
That policy can point at:
- Selected buckets for grouped source sets.
- Named profiles such as
calypso-rag-agent:{profile_id}, where each profile can have its own bucket scope. - Uploaded agent files, which are now also backed by one selected bucket.
flowchart LR
uploadPath["Upload path"] --> bucketDestination["Required bucket destination"]
bucketDestination --> indexing["Indexing task"]
indexing --> indexedSource["Indexed source"]
indexedSource --> bucketStore["Bucket store"]
bucketStore --> agentPolicy
agentPolicy --> ragAgent["calypso-rag-agent"]
Bucket API endpoints
| Endpoint | Best for | Bucket fields | Readiness signal | Details |
|---|---|---|---|---|
GET /v1/knowledge/buckets | Discovering valid bucket ids and slugs before upload. | None; team scope comes from the API key. | Returns bucket status, counts, and bucket-store state. | This page |
POST /v1/knowledge/files/upload-session | Create a direct-to-storage session for one file. | bucket_ids, bucket_slugs, or bucket; create_missing_buckets for slug provisioning. | Finalize, then poll GET /v1/knowledge/files/{file_id} or task status. | Single-file Upload API |
POST /v1/knowledge/files:batch/upload-session | Create direct-to-storage sessions for 1 to 100 files. | Shared bucket fields at manifest root, or bucket fields on every item. | Finalize, then poll GET /v1/knowledge/batches/{batch_id}?include_items=true. | Batch Upload API |
If a public API upload does not include a bucket destination, Calypso rejects it with bucket_required. MCP upload details live in the MCP integration page.
Readiness model
Uploads have multiple readiness layers:
- Accepted: Calypso received a valid request and stored durable upload state.
- Queued: indexing work exists but has not completed.
- Indexing / processing: the provider is ingesting the source.
- Active / indexed: the source has canonical indexed content.
- Bucket active: the source can be retrieved through its selected bucket.
For reliable tests, do not evaluate answers immediately after upload. Poll the file, task, or batch status until bucketSyncStatus is active for the relevant bucket.
Choosing the right upload path
Use the UI when a person is curating source quality. Use the single-file Upload API when an integration emits one document at a time. Use the Batch Upload API when a job emits many files and needs idempotency and per-item status. Use the MCP integration when an AI client or local agent should trigger the ingestion flow directly.
Choose bucket ids when the integration already knows the stable destination. Choose bucket slugs when the integration is managed by humans or deployment scripts. Use create_missing_buckets=true only when the upload job is allowed to provision missing slug-based buckets.
Next: