We gave every agent answer a receipt: the why-trace¶
An engineering deep-dive on how AgenticMind makes retrieval answers auditable. Companion to the AgenticMind repo.
The problem: agent memory is a black box¶
Most "memory for agents" is a vector store with two verbs: save() and search().
That buys you fuzzy recall and nothing else. When an answer comes back, you can't tell:
- Why this passage was chosen and not another.
- Whether a source actually supports the claim, or the model filled the gap.
- Whether the answer is current, or a stale belief that should have been superseded.
For a demo that's fine. In production — and especially anywhere a regulator might ask "how did the system arrive at this?" — it's a liability. You're shipping a component whose decisions you can't reconstruct.
What we built¶
AgenticMind treats every answer as something you can replay. The retrieval/synthesis path is two enforced rules plus a recorded decision log:
- No source, no claim. Synthesis is citation-enforced: every statement in an answer is keyed to a numbered source, and the unsupported parts are refused rather than fabricated.
- A receipt for every answer. Each call emits a structured why-trace — the
retrieval and synthesis steps, the model used, the timings, and the citations,
all addressable after the fact by a
telemetryId.
Here's a real kl_ask_global call. The question has two halves on purpose — one the
corpus can answer, one it can't:
// → kl_ask_global
{ "question": "When should I use a multi-agent architecture instead of a single agent,
and what must every agent ship with according to the standard?" }
// ← response (trimmed)
{
"answer": "The provided sources do not specify when to use a multi-agent architecture
versus a single agent. … According to the Agentic Product Standard, every
agent must ship with a written Agent Contract [1]. This contract must cover
ownership, forbidden actions, acceptance criteria, failure modes, escalation
rules, and logging requirements [1].",
"citations": [
{ "number": 1, "title": "Agent Contract requirement",
"materialId": "ba44971b-…", "score": 0.46, "origin": "chunk" }
],
"model": "…",
"retrievalMs": 606, "generationMs": 890,
"phases": [ {"phase":"embed","ms":552}, {"phase":"retrieve","ms":37},
{"phase":"synth","ms":890}, {"phase":"output_filter","ms":2} ],
"telemetryId": "cc942e54-…"
}
Read what didn't happen: the half the corpus couldn't support, the model refused ("the provided sources do not specify…") instead of inventing an answer. The half it could support is keyed to a citation you can open. And the whole thing carries a trace you can replay.
What's in the trace¶
The trace is not a log line — it's a structured record designed to answer "why":
phases— the ordered steps with per-phase timings:embed→retrieve→synth→output_filter. When something is slow or wrong, you see which stage.citations— for each cited source: itsmaterialId, the retrievalscore, and theorigin(e.g. a raw chunk vs. a distilled fact card vs. the graph). You can see not just that a source was used, but how it was found.model+retrievalMs/generationMs— what answered, and where the time went.telemetryId— the handle that ties the answer to its trace so you can pull it up later, attach it to a support ticket, or feed it into an eval.
Because the trace is captured at decision time (in packages/shared/src/lib/observability),
it reflects the real path the engine took — not a reconstruction after the fact.
Why it matters¶
- Debugging. "The agent gave a weird answer" becomes a concrete investigation: open the trace, see which chunk scored 0.46 and got cited, fix the corpus or the ranking. No guessing.
- Trust. A citation you can open is the difference between "the model said so" and "here's the source." Refusal on unsupported questions is a feature, not a failure.
- Audit & compliance. Regulated buyers increasingly need explainable retrieval — the EU AI Act pushes toward systems whose outputs can be traced to their inputs. A replayable why-trace is most of that artifact already, by construction rather than bolted on.
The honest tradeoffs¶
- Citation-enforced synthesis refuses more. If your corpus is thin, you'll see "the sources don't support this" more often than a chatty vector-store wrapper would. We think that's the correct default for anything you'll be accountable for — but it's a real behavioral difference, not free.
- Recording a trace per answer has a cost. It's small relative to embedding + generation, and it's the kind of cost you only resent until the first production incident.
Try it¶
AgenticMind is Apache-2.0 and self-hostable on Postgres alone. One command, no clone, no token minting:
OPENAI_API_KEY=sk-... sh -c "$(curl -fsSL https://raw.githubusercontent.com/Moai-Team-LLC/AgenticMind/main/quickstart.sh)"
Then point any MCP client at it, ingest something, ask a question — and open the trace.
AgenticMind is the reference implementation of the Agentic Product Standard.