Managed RAG in 15 minutes

15 min

By the end you'll have created a vector store, uploaded a PDF, waited for the ingest pipeline to parse and embed it, asked a question via the OpenAI file_search tool and pulled the citations out of the response.

Prerequisites

• A Ringside account (sign up at ringside.fightclub.pro/register)
• Python 3.9+ with openai >= 1.40 installed
• A PDF you don't mind uploading. A product handbook, a research paper, anything.
• 15 minutes

Step 1

Get an API key + an assistant

// one-time setup

Mint an API key at ringside.fightclub.pro/app/api-keys and export it as FC_API_KEY. While you're in the dashboard, create an Assistant under /app/assistants with the instructions 'Answer using the supplied files. Cite the file_id and chunk index for every claim.' Copy its asst_ ID; we'll use it in Step 5.

export FC_API_KEY=fc_sk_live_...
export FC_ASSISTANT_ID=asst_...
pip install --upgrade openai

Step 2

Create a vector store

// one tenant per customer

One vector store per customer in your app is the standard pattern. The embedding_model is locked at create time but switchable later via the dashboard's migrate flow (your re-embed runs in the background from cached parses, you pay embedding tokens only).

from openai import OpenAI

client = OpenAI(
    base_url="https://api.fightclub.pro/v1",
    api_key=os.environ["FC_API_KEY"],
)

store = client.vector_stores.create(
    name="acme-handbook",
    embedding_model="text-embedding-3-small",
)
print("store id:", store.id)
# => store id: vs_a1b2c3d4...

Step 3

Upload a file + attach it to the store

// async ingest starts here

Upload returns a file ID synchronously. Attaching the file to the vector store kicks off the async ingest pipeline (parse + chunk + embed + index). The attach call returns immediately with status='pending'.

with open("handbook.pdf", "rb") as fp:
    file = client.files.create(file=fp, purpose="attachments")
print("file id:", file.id)
# => file id: file_xyz789...

vsf = client.vector_stores.files.create(
    vector_store_id=store.id,
    file_id=file.id,
)
print("vsf status:", vsf.status)
# => vsf status: pending

Step 4

Wait for ingest to finish

// poll, or subscribe to a webhook

For a tutorial we poll. In production, register a vector_store.file.completed webhook so your worker fires when the file is searchable. Ingest for a 30-page PDF lands in seconds. A 300-page corpus runs in a couple of minutes.

import time

while True:
    f = client.vector_stores.files.retrieve(
        vector_store_id=store.id,
        file_id=file.id,
    )
    print(f"  {f.status}", "" if not f.last_error else f.last_error)
    if f.status in ("completed", "failed", "cancelled"):
        break
    time.sleep(2)
# Expected progression: pending -> in_progress -> completed

Step 5

Ask a question via file_search

// Assistants run with the tool config

The retrieval call is an Assistants run with the file_search tool config pointing at your store. The assistant's instructions tell the model what to do with the retrieved chunks; the run does the embed-the-query, retrieve, stuff-into-context dance for you.

thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What's the company-wide expense reporting cut-off?",
)

run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=os.environ["FC_ASSISTANT_ID"],
    tools=[{
        "type": "file_search",
        "file_search": {"vector_store_ids": [store.id]},
    }],
    # Optional but recommended: attribute the call to the end-customer who triggered it
    extra_headers={"FC-Customer": "cus_42"},
)

messages = client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
answer = messages.data[0]
print(answer.content[0].text.value)

Step 6

Read the citations out of the response

// annotations carry file_id + chunk_index

The assistant response contains an annotations array on each text content block. Each annotation has the file_id of the source file and the chunk_index the retrieval came from. You can render these as inline citations in your UI or use them server-side for audit.

for block in answer.content:
    if block.type != "text":
        continue
    for ann in block.text.annotations:
        if ann.type == "file_citation":
            fc = ann.file_citation
            print(f"  cited file_id={fc.file_id} chunk_index={fc.chunk_index}")
            # Pull the source file's filename for a human-readable label:
            src = client.files.retrieve(fc.file_id)
            print(f"    -> {src.filename}")

What you just shipped

A customer uploads a file, your app attaches it to that customer's vector store, your app answers questions about the file with citations. The retrieval log, per-customer cost attribution, embedding model migration and the rest of the RAG plumbing live on our side; your code is the six steps above.

Next steps

· Citation-parsing recipe for the production-quality version of Step 6.
· Vector stores API reference for the full endpoint list (list/get/patch/delete, file batches, cancel/retry, queries, stats, migrate, rollback).
· RAG product page for the broader pitch and pricing.
· RAG pricing if you want to model your monthly cost before you scale.