Reflection
≈ Unit Testing / Code Review / Test-Driven Development (TDD) Loop
TL;DR
Enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop.
> Agentic Definition
Reflection (or Self-Correction) enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop, significantly increasing the quality and robustness of the final output.
Practice Reflection
Build, debug, prompt, and optimize — 3 difficulty tiers
Before: Single Pass Generation
1def generate_code(spec):2 return codegen_module.generate(spec)3 # If it's buggy, the user finds out later in production.After: Reflective Loop Architecture
1def generate_robust_code(spec):2 # Draft Phase3 code = coder_agent.run(spec)45 # Reflection Loop (The "Code Review")6 for _ in range(MAX_RETRIES):7 # The Critic Agent acts as the Unit Test Suite8 critique = reviewer_agent.run(code)910 if critique.is_approved():11 return code12 else:13 # Feedback injection: The loop enables self-correction14 code = coder_agent.run(spec, feedback=critique.comments)1516 raise MaxRetriesExceededError("Unable to generate valid code")Traditional SWE
Unit Testing / TDD Loop
Agentic Pattern
Reflection
≈How They're Similar
Both exist to catch errors before final delivery. The "Red-Green-Refactor" loop in TDD is structurally analogous to the "Draft-Critique-Revise" loop in Reflection. It is a quality assurance gate embedded in the development/execution process.
≠Key Divergence
Unit tests check against known, deterministic assertions (e.g., assert x == 5). Reflection checks against qualitative, ambiguous criteria (e.g., "Is this tone professional?", "Is this code secure?", "Does this argument make sense?") using an LLM as the judge. The "test" itself is probabilistic.
> Production Considerations
Reflection is expensive in terms of time. It can double or triple latency depending on the number of reflection cycles permitted. It is a trade-off: higher latency for higher accuracy.
Highly effective at reducing hallucinations. However, there is a risk of "degeneracy," where the agent critiques itself into a worse state or gets stuck in a loop of minor, inconsequential edits.
Increases token usage significantly. Use cheaper models for the drafting phase and stronger, more reasoning-capable models for the critique phase to optimize costs.
Key Takeaway
Adapt: You are embedding the "Code Review" process directly into the runtime application. The system self-corrects before the human ever sees the output. You must design the "Critique Prompts" as carefully as you write unit tests.
Practice Reflection
4 challenge types: Build architectures, debug broken pipelines, write real prompts, and optimize costs. Three difficulty tiers from Apprentice to Architect.
Go to Practice LabsFrequently Asked Questions
When should I use the Reflection pattern?
Reflection (or Self-Correction) enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop, significantly increasing the quality and robustness of the final output.
How does Reflection relate to Unit Testing / Code Review / Test-Driven Development (TDD) Loop?
Both exist to catch errors before final delivery. The "Red-Green-Refactor" loop in TDD is structurally analogous to the "Draft-Critique-Revise" loop in Reflection. It is a quality assurance gate embedded in the development/execution process. However, there is a key divergence: Unit tests check against known, deterministic assertions (e.g., assert x == 5). Reflection checks against qualitative, ambiguous criteria (e.g., "Is this tone professional?", "Is this code secure?", "Does this argument make sense?") using an LLM as the judge. The "test" itself is probabilistic.
What are the production trade-offs of Reflection?
Reflection is expensive in terms of time. It can double or triple latency depending on the number of reflection cycles permitted. It is a trade-off: higher latency for higher accuracy. Highly effective at reducing hallucinations. However, there is a risk of "degeneracy," where the agent critiques itself into a worse state or gets stuck in a loop of minor, inconsequential edits. Increases token usage significantly. Use cheaper models for the drafting phase and stronger, more reasoning-capable models for the critique phase to optimize costs.