Docs & Examples
Instrument any RAG pipeline.
TraceroAI is instrumentation, not a chat app. Add the SDK and every answer becomes a debuggable trace — retrieval, prompt, answer, evaluations, and an automatic diagnosis — plus an optional agent that retries the stage that failed.
Install
pip install traceroaiThe only base dependency is httpx. Telemetry is best-effort — if the API is unreachable, the SDK warns and continues; it never breaks your app.
Try it live
Ask the built-in demo knowledge base a question. It runs real retrieval + evaluation on the live API and shows the trace below — no signup, no key.
Send traces
Three styles, from most explicit to most automatic — they all produce the same kind of trace.
from traceroai import TraceroClient
client = TraceroClient(
base_url="https://traceroai.onrender.com",
api_key="your_project_key", # optional
)
with client.trace("How long does a refund take?") as t:
chunks = retrieve(query) # your retriever
t.log_retrieval(chunks, strategy="hybrid", config={"top_k": 3})
t.log_prompt(prompt_text, version="grounded_v1") # optional
answer = generate(prompt, chunks) # your LLM
t.log_generation(answer, model="gpt-4o-mini", provider="openai")
print(t.trace_id) # the sent trace id (None if sending failed)SDK reference
The full public surface of TraceroClient.
client.trace(query, *, rewritten=, project=, metadata=)Open a tracing context manager. Auto-times the block and sends on exit.
t.log_retrieval(chunks, *, strategy=, config=)Record retrieved chunks (list of dicts, each with at least a `text` field).
t.log_prompt(content, *, version=, template_name=)Optional — record the prompt sent to the model.
t.log_generation(answer, *, model, provider=, temperature=)Record the generated answer. `model` is required.
client.traced(*, model, strategy='vector')Decorator for a function returning (answer, chunks).
client.log_trace(*, query, retrieval, generation, ...)Low-level: assemble and send a whole trace in one call.
client.get_trace(trace_id)Fetch a stored trace back — diagnosis + evaluations included.
Diagnosis labels
The server evaluates every trace (embedding-cosine relevance + groundedness, plus an async LLM judge) and reduces it to one label:
healthy_answerRetrieval, grounding, and relevance all pass.
correct_refusalThe model rightly declined — the context didn't support an answer.
retrieval_missRetrieved context doesn't match the query.
unsupported_claimThe answer asserts things the context doesn't support.
wrong_answerThe answer doesn't address the query.
needs_reviewMixed signals — flagged for a human.
Self-healing recovery
An optional LangGraph agent that fixes its own bad answers: it evaluates each attempt and retries the stage that failed — re-retrieving with more context on a retrieval_miss, re-generating with a stricter prompt on an unsupported_claim — until the answer is healthy or it escalates to review (bounded by max_attempts).
pip install "traceroai[recovery]"from traceroai import TraceroClient
from traceroai.recovery import RecoveryAgent
agent = RecoveryAgent(
client,
retrieve=my_retrieve, # (query, top_k) -> list[chunk dict]
generate=my_generate, # (query, context) -> answer str
max_attempts=3,
)
result = agent.run("How long does a refund take?")
result["answer"] # final answer
result["diagnosis"] # final diagnosis label
result["attempts"] # how many tries it took
result["succeeded"] # reached a healthy answer?
result["trace_ids"] # the retry chain (one trace per attempt)
result["deep_eval"] # final LLM-judge verdict (or None if pending)Experiment runs
Compare pipeline configs (top_k, prompt, model) against a labeled dataset. The harness replays every case across each variant, grades the answers with the LLM judge, and recommends a winner — A/B testing for RAG. Results show up under Eval Runs.
# from services/api — runs the experiment and posts the result
python -m app.eval_runner \
--dataset support_faq_v1 \
--project my-app \
--post \
--api-url https://traceroai.onrender.comEach run records, per variant: accuracy (correct vs. expected answers, judged by the LLM) and average latency. The recommended winner is the highest-accuracy variant.
Auth & projects
Pass a project API key and the server stamps every trace with that project, so the dashboard can filter to it. The key is optional — without one, traces use whatever project_id you pass. Reads are open, so anyone can browse the demo.
client = TraceroClient(
base_url="https://traceroai.onrender.com",
api_key="your_project_key",
)