What is the reflection pattern in agentic AI?

The Reflection pattern is an agentic design pattern where an AI agent evaluates its own output, identifies problems, and iterates to improve it. The loop is: (1) Generate an initial output, (2) Critique it against quality criteria, (3) Revise based on the critique, (4) Repeat until the output passes. This maps to Test-Driven Development (TDD) in software engineering, write a test (define criteria), run it (evaluate output), fix failures (revise). Reflection is what separates mediocre AI outputs from production-quality ones. Learn the full pattern at learnagenticpatterns.com/patterns/reflection.

All posts

patternsreflectionquality

AI Agent Reflection Pattern: Self-Critique Explained (with Code)

The reflection pattern has an AI agent critique and revise its own output in a loop — generate, evaluate, improve. It's TDD for LLMs. Full walkthrough with code.

3 min readFebruary 22, 2026Updated May 13, 2026

TL;DR The One Thing to Know

The Reflection pattern has an AI agent critique and revise its own output in a loop, generate, evaluate, improve, until the result meets quality criteria. It maps to TDD in software engineering.

Why single-shot generation fails

When you ask an LLM to write code, an email, or an analysis in one shot, the output is typically a first draft. It has issues. Humans don't ship first drafts, we review, revise, and iterate. The Reflection pattern gives AI agents the same capability: the ability to critique their own work and improve it.

The generate-evaluate-revise loop

Reflection works in three steps on a loop. First, the agent generates an output. Second, a critic (often the same LLM with a different prompt) evaluates the output against specific criteria, 'Is this code correct? Does it handle edge cases? Is the explanation clear?' Third, the agent revises based on the critique. This loop repeats until the output passes all checks or a maximum iteration count is reached.

Reflection loop, self-correcting code generation

MAX_ITERATIONS = 3

code = llm.call(f"Write a Python function that {task}")

for i in range(MAX_ITERATIONS):
    critique = llm.call(
        f"Review this code for bugs and edge cases:\n{code}"
    )
    if "no issues found" in critique.lower():
        break
    code = llm.call(
        f"Fix this code based on the review:\n{code}\n\nReview:\n{critique}"
    )

The SWE parallel: TDD

If you practice Test-Driven Development, you already understand Reflection. TDD: write a test → run it (fails) → write code → run test (passes). Reflection: define criteria → generate output → evaluate (fails criteria) → revise → evaluate (passes). Same loop. Same philosophy: define what 'good' looks like before you generate, then iterate until you get there.

Practical tips

Cap your iterations (3–5 is typical) to avoid infinite loops and runaway costs. Make your evaluation criteria specific and measurable, not 'is this good?' but 'does this handle null input, return the correct type, and run in O(n) time?' Consider using a different model for the critic than the generator for diverse perspectives.

Key Takeaway

Reflection is TDD for LLMs: define quality criteria, generate, evaluate, revise, repeat. It's the single most effective pattern for improving AI output quality.

Go Deeper Full Pattern Breakdown

This post covers the basics. The full curriculum page for Reflection includes the SWE mapping, code examples, production notes, and an interactive building exercise.

Reflection → Unit Testing / TDD Loop

Share this post:Twitter/X LinkedIn

Why single-shot generation fails

The generate-evaluate-revise loop

The SWE parallel: TDD

Practical tips

AI-Readable Summary