Docs & Examples

Instrument any RAG pipeline.

TraceroAI is instrumentation, not a chat app. Add the SDK and every answer becomes a debuggable trace — retrieval, prompt, answer, evaluations, and an automatic diagnosis — plus an optional agent that retries the stage that failed.

Install

bash
pip install traceroai

The only base dependency is httpx. Telemetry is best-effort — if the API is unreachable, the SDK warns and continues; it never breaks your app.

Try it live

Ask the built-in demo knowledge base a question. It runs real retrieval + evaluation on the live API and shows the trace below — no signup, no key.

Try:

Send traces

Three styles, from most explicit to most automatic — they all produce the same kind of trace.

from traceroai import TraceroClient

client = TraceroClient(
    base_url="https://traceroai.onrender.com",
    api_key="your_project_key",   # optional
)

with client.trace("How long does a refund take?") as t:
    chunks = retrieve(query)                 # your retriever
    t.log_retrieval(chunks, strategy="hybrid", config={"top_k": 3})
    t.log_prompt(prompt_text, version="grounded_v1")   # optional
    answer = generate(prompt, chunks)        # your LLM
    t.log_generation(answer, model="gpt-4o-mini", provider="openai")

print(t.trace_id)   # the sent trace id (None if sending failed)

SDK reference

The full public surface of TraceroClient.

client.trace(query, *, rewritten=, project=, metadata=)

Open a tracing context manager. Auto-times the block and sends on exit.

t.log_retrieval(chunks, *, strategy=, config=)

Record retrieved chunks (list of dicts, each with at least a `text` field).

t.log_prompt(content, *, version=, template_name=)

Optional — record the prompt sent to the model.

t.log_generation(answer, *, model, provider=, temperature=)

Record the generated answer. `model` is required.

client.traced(*, model, strategy='vector')

Decorator for a function returning (answer, chunks).

client.log_trace(*, query, retrieval, generation, ...)

Low-level: assemble and send a whole trace in one call.

client.get_trace(trace_id)

Fetch a stored trace back — diagnosis + evaluations included.

Diagnosis labels

The server evaluates every trace (embedding-cosine relevance + groundedness, plus an async LLM judge) and reduces it to one label:

healthy_answer

Retrieval, grounding, and relevance all pass.

correct_refusal

The model rightly declined — the context didn't support an answer.

retrieval_miss

Retrieved context doesn't match the query.

unsupported_claim

The answer asserts things the context doesn't support.

wrong_answer

The answer doesn't address the query.

needs_review

Mixed signals — flagged for a human.

Self-healing recovery

An optional LangGraph agent that fixes its own bad answers: it evaluates each attempt and retries the stage that failed — re-retrieving with more context on a retrieval_miss, re-generating with a stricter prompt on an unsupported_claim — until the answer is healthy or it escalates to review (bounded by max_attempts).

bash
pip install "traceroai[recovery]"
from traceroai import TraceroClient
from traceroai.recovery import RecoveryAgent

agent = RecoveryAgent(
    client,
    retrieve=my_retrieve,    # (query, top_k) -> list[chunk dict]
    generate=my_generate,    # (query, context) -> answer str
    max_attempts=3,
)

result = agent.run("How long does a refund take?")
result["answer"]      # final answer
result["diagnosis"]   # final diagnosis label
result["attempts"]    # how many tries it took
result["succeeded"]   # reached a healthy answer?
result["trace_ids"]   # the retry chain (one trace per attempt)
result["deep_eval"]   # final LLM-judge verdict (or None if pending)

Experiment runs

Compare pipeline configs (top_k, prompt, model) against a labeled dataset. The harness replays every case across each variant, grades the answers with the LLM judge, and recommends a winner — A/B testing for RAG. Results show up under Eval Runs.

bash
# from services/api — runs the experiment and posts the result
python -m app.eval_runner \
    --dataset support_faq_v1 \
    --project my-app \
    --post \
    --api-url https://traceroai.onrender.com

Each run records, per variant: accuracy (correct vs. expected answers, judged by the LLM) and average latency. The recommended winner is the highest-accuracy variant.

Auth & projects

Pass a project API key and the server stamps every trace with that project, so the dashboard can filter to it. The key is optional — without one, traces use whatever project_id you pass. Reads are open, so anyone can browse the demo.

client = TraceroClient(
    base_url="https://traceroai.onrender.com",
    api_key="your_project_key",
)
Open the DashboardView on PyPI