Pattern [04]

Reflection

≈ Unit Testing / Code Review / Test-Driven Development (TDD) Loop

[04]

TL;DR

Enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop.

SWE: Unit Testing / TDD Loop

Architecture Flow

agentgate

> Agentic Definition

Reflection (or Self-Correction) enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop, significantly increasing the quality and robustness of the final output.

Before: Single Pass Generation

Single Pass Generation

1def generate_code(spec):
2    return codegen_module.generate(spec)
3    # If it's buggy, the user finds out later in production.

After: Reflective Loop Architecture

Reflective Loop Architecture

1def generate_robust_code(spec):
2    # Draft Phase
3    code = coder_agent.run(spec)
4
5    # Reflection Loop (The "Code Review")
6    for _ in range(MAX_RETRIES):
7        # The Critic Agent acts as the Unit Test Suite
8        critique = reviewer_agent.run(code)
9
10        if critique.is_approved():
11            return code
12        else:
13            # Feedback injection: The loop enables self-correction
14            code = coder_agent.run(spec, feedback=critique.comments)
15
16    raise MaxRetriesExceededError("Unable to generate valid code")

Frequently Asked Questions

When should I use the Reflection pattern?

How does Reflection relate to Unit Testing / Code Review / Test-Driven Development (TDD) Loop?

Both exist to catch errors before final delivery. The "Red-Green-Refactor" loop in TDD is structurally analogous to the "Draft-Critique-Revise" loop in Reflection. It is a quality assurance gate embedded in the development/execution process. However, there is a key divergence: Unit tests check against known, deterministic assertions (e.g., assert x == 5). Reflection checks against qualitative, ambiguous criteria (e.g., "Is this tone professional?", "Is this code secure?", "Does this argument make sense?") using an LLM as the judge. The "test" itself is probabilistic.

What are the production trade-offs of Reflection?

Reflection is expensive in terms of time. It can double or triple latency depending on the number of reflection cycles permitted. It is a trade-off: higher latency for higher accuracy. Highly effective at reducing hallucinations. However, there is a risk of "degeneracy," where the agent critiques itself into a worse state or gets stuck in a loop of minor, inconsequential edits. Increases token usage significantly. Use cheaper models for the drafting phase and stronger, more reasoning-capable models for the critique phase to optimize costs.

Reflection

TL;DR

> Agentic Definition

Practice Reflection

Before: Single Pass Generation

After: Reflective Loop Architecture

≈How They're Similar

≠Key Divergence

> Production Considerations

Key Takeaway

Frequently Asked Questions

Further Reading

AI Agent Reflection Pattern: Self-Critique Explained (with Code)

[05] · Tool Use