The Reflection Pattern: How AI Agents Self-Correct
The Reflection pattern lets an AI agent review its own output, find errors, and fix them — like TDD for LLMs.
The Reflection pattern has an AI agent critique and revise its own output in a loop — generate, evaluate, improve — until the result meets quality criteria. It maps to TDD in software engineering.
Why single-shot generation fails
When you ask an LLM to write code, an email, or an analysis in one shot, the output is typically a first draft. It has issues. Humans don't ship first drafts — we review, revise, and iterate. The Reflection pattern gives AI agents the same capability: the ability to critique their own work and improve it.
The generate-evaluate-revise loop
Reflection works in three steps on a loop. First, the agent generates an output. Second, a critic (often the same LLM with a different prompt) evaluates the output against specific criteria — 'Is this code correct? Does it handle edge cases? Is the explanation clear?' Third, the agent revises based on the critique. This loop repeats until the output passes all checks or a maximum iteration count is reached.
MAX_ITERATIONS = 3
code = llm.call(f"Write a Python function that {task}")
for i in range(MAX_ITERATIONS):
critique = llm.call(
f"Review this code for bugs and edge cases:\n{code}"
)
if "no issues found" in critique.lower():
break
code = llm.call(
f"Fix this code based on the review:\n{code}\n\nReview:\n{critique}"
)The SWE parallel: TDD
If you practice Test-Driven Development, you already understand Reflection. TDD: write a test → run it (fails) → write code → run test (passes). Reflection: define criteria → generate output → evaluate (fails criteria) → revise → evaluate (passes). Same loop. Same philosophy: define what 'good' looks like before you generate, then iterate until you get there.
Practical tips
Cap your iterations (3–5 is typical) to avoid infinite loops and runaway costs. Make your evaluation criteria specific and measurable — not 'is this good?' but 'does this handle null input, return the correct type, and run in O(n) time?' Consider using a different model for the critic than the generator for diverse perspectives.
Reflection is TDD for LLMs: define quality criteria, generate, evaluate, revise, repeat. It's the single most effective pattern for improving AI output quality.
This post covers the basics. The full curriculum page for Reflection includes the SWE mapping, code examples, production notes, and an interactive building exercise.
Reflection → Unit Testing / TDD Loop