Roll for Insight: What Dungeon Masters Can Teach Us About RAG

I recently built a RAG chatbot that answers questions about Dungeons & Dragons 5th Edition rules. Ask it about anything in the rulebooks, it retrieves the relevant passages from the Dungeon Master’s Guide, Player’s Handbook, and Monster Manual, and Claude synthesizes a cited answer grounded in what it found. While building it, I noticed the program flow mapped almost exactly onto what a well-prepared Dungeon Master does at the table.

The Improv DM Problem

There are two kinds of DMs. The first runs everything from memory and intuition — absorbed the general feel of the rules, fills in the gaps on the fly, never breaks flow to look anything up. Ask this DM what happens when two concentration spells interact and you’ll get a confident answer that may or may not match the rulebook. Think of it as rolling a Natural 1 on a Knowledge check: full confidence, zero accuracy. Repeat enough times across a campaign and the rules become a moving target.

This is a base language model without retrieval. Trained on enough D&D content to sound authoritative, but generating from internalized patterns rather than the actual text. It will occasionally produce rules that don’t exist, numbers that are wrong, mechanics that blend two editions. Confident, fluent, and sometimes wrong.

The second kind of DM preps. When a player asks a rules question, they reach for the book, find the passage, and read it aloud. “PHB page 198 says…” The answer is grounded in source text the player can verify. RAG is the second DM.

Retrieve First, Then Generate

The core RAG loop: encode the question into a vector, search the knowledge base for the most similar passages, load those passages into the prompt as context, then ask the model to synthesize an answer. The model isn’t asked to recall — it’s given the relevant text.

This is exactly what the well-prepared DM is doing. A player asks whether their paladin’s Divine Smite works against undead. The DM doesn’t generate from scratch — they retrieve the rule and synthesize: “Divine Smite does extra radiant damage against undead and fiends, so yes, you’d get the bonus.” Think of the vector database as the DM’s binder — that beat-up three-ring binder every veteran DM has, stuffed with printed rules and tabbed for fast lookup. It doesn’t generate anything. It just holds what you might need.

Chunking Is How You’ve Internalized the Rules

In a RAG pipeline, chunking strategy matters. Chunks too large bury the specific sentence you need in irrelevant text. Too small and you lose the surrounding context that gives a rule its meaning. Good chunking carves the material at its natural joints — the units in which information is actually used.

An experienced DM doesn’t remember the PHB as a continuous 300-page document. They remember it in functional units: running combat, spellcasting mechanics, character class abilities. A DM internalized at the chapter level answers slowly and approximately. One internalized at the subsystem level retrieves quickly and precisely. The quality of retrieval depends on how the knowledge was stored — in both the DM and the vector database.

Session Prep as Context Loading

A DM preparing for a dungeon heist session doesn’t re-read the entire DMG. They re-read the sections on traps, locks, skill checks, and chase mechanics — loading the context most likely to come up. This is what RAG does at query time. The retrieval step surfaces the passages most relevant to this specific question and loads them into the model’s context window. The context window is the DM’s working memory during the session. Pre-retrieval is prep.

Citations as Accountability

When an answer is generated from retrieved chunks, you know exactly which passages were used, what pages they came from, and which book they belong to. Every answer is verifiable. If the system says “[PHB p. 198]”, you can go check.

The rules-lawyer player at the D&D table is asking for the same thing. “Where does it say that?” is a citation request. Every rules-lawyer is essentially implementing RAG manually: retrieve first, verify second, then argue about it for fifteen minutes. The DM who can answer with a page number is running a grounded game. The DM who can’t is running on vibes — and vibes make bad AI systems.

What the Parallel Reveals

“Retrieval-augmented generation” sounds like an engineering optimization. “Look it up before you answer” sounds like common sense. They’re the same instruction. Both the DM and RAG separate the retrieval problem from the generation problem — retrieval is a search problem, generation is a language problem — and by separating them, each component can do what it’s actually good at. The best DMs are good at both: fast, accurate retrieval from years of internalized rules, plus the improvisational skill to synthesize a ruling that fits the moment. RAG makes the same decomposition explicit, and by making it explicit, it becomes something you can measure, improve, and trust.