← Writing

Building DocSentinel: six agents instead of one big prompt

DocSentinel is an open-source platform that automates security assessment across the software lifecycle. The interesting decision wasn’t the model — it was refusing to make it one model call. Instead it’s six specialised agents coordinated by a graph. This post is about why.

DocSentinel system overview: documents and queries enter a LangGraph orchestrator of six phase-specialised agents, which draws on a hybrid retrieval layer (vector + graph) and a multi-LLM gateway, and is callable over MCP/A2A

Documents and queries flow into a six-agent orchestrator, which leans on a hybrid retrieval layer and a multi-LLM gateway, and exposes itself over MCP/A2A. (Agent names here are illustrative — see the note at the end.)

The problem with one big prompt

The tempting first version of any LLM tool is a single prompt: dump the context in, ask for the answer, ship it. For a security review that falls apart quickly:

A security assessment is not one task. It’s intake, threat modelling, control review, evidence checking, and reporting — each a different job with different inputs and different failure modes. So each becomes an agent.

Six agents on a graph

The agents are orchestrated with LangGraph, which lets the workflow be an explicit state machine rather than an implicit chain of hope. Each node owns one phase, has its own prompt and tools, and hands typed state to the next:

graph.add_node("intake", intake_agent)
graph.add_node("threat_model", threat_agent)
graph.add_node("control_review", control_agent)
graph.add_edge("intake", "threat_model")
graph.add_conditional_edges(
    "threat_model",
    route_by_risk,          # high-risk findings loop back for a deeper pass
    {"deep": "control_review", "done": "report"},
)

The payoff: the parts you can make deterministic (routing, gating, retries) stay deterministic, and only the genuinely fuzzy work is left to the model.

The agent graph: intake feeds threat-modelling, a risk router sends high-risk work through code review, controls and evidence before reporting while low-risk work reports directly, with a loop back to re-model on new findings

The graph makes control flow explicit: a risk router decides how deep to go, and new findings loop back to re-model rather than pushing forward blindly. (Illustrative phase names.)

Retrieval: vector and graph

Security knowledge is relational — a control mitigates a threat against an asset. Pure semantic search flattens that structure, so DocSentinel runs hybrid retrieval:

Vector search finds the paragraph. The graph tells you which control it came from, what it mitigates, and what else that decision touches.

Hybrid retrieval: a query fans out to vector search over ChromaDB and graph retrieval over LightRAG, whose results are fused and reranked into the context handed to the agent

A query fans out to both retrievers — semantic similarity and graph traversal — and the results are fused and reranked into the context the agent actually sees.

For a reviewer, the second question is the one that matters — and it’s the one a flat index can’t answer.

Making it callable: MCP and A2A

DocSentinel exposes itself over the Model Context Protocol (MCP) so other agents and IDEs can call it as a tool, and speaks A2A for agent-to-agent hand-off. The lesson that generalises: build the capability as a protocol endpoint, not a UI feature, and it composes into workflows you didn’t anticipate.

What I’d tell past me


DocSentinel is MIT-licensed and on GitHub. The diagrams here are explanatory, and the individual agent/phase names are representative rather than exact. More write-ups on agentic AI and LLM security to come.