Reflection
≈ Unit Testing / Code Review / Test-Driven Development (TDD) Loop
> Agentic Definition
Reflection (or Self-Correction) enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop, significantly increasing the quality and robustness of the final output.
> Description
Enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop.
Before: Single Pass Generation
1def generate_code(spec):2 return codegen_module.generate(spec)3 # If it's buggy, the user finds out later in production.After: Reflective Loop Architecture
1def generate_robust_code(spec):2 # Draft Phase3 code = coder_agent.run(spec)45 # Reflection Loop (The "Code Review")6 for _ in range(MAX_RETRIES):7 # The Critic Agent acts as the Unit Test Suite8 critique = reviewer_agent.run(code)910 if critique.is_approved():11 return code12 else:13 # Feedback injection: The loop enables self-correction14 code = coder_agent.run(spec, feedback=critique.comments)1516 raise MaxRetriesExceededError("Unable to generate valid code")≈ Similarity
Both exist to catch errors before final delivery. The "Red-Green-Refactor" loop in TDD is structurally analogous to the "Draft-Critique-Revise" loop in Reflection. It is a quality assurance gate embedded in the development/execution process.
≠ Divergence
Unit tests check against known, deterministic assertions (e.g., assert x == 5). Reflection checks against qualitative, ambiguous criteria (e.g., "Is this tone professional?", "Is this code secure?", "Does this argument make sense?") using an LLM as the judge. The "test" itself is probabilistic.
> Production Considerations
- [01]Reflection is expensive in terms of time. It can double or triple latency depending on the number of reflection cycles permitted. It is a trade-off: higher latency for higher accuracy.
- [02]Highly effective at reducing hallucinations. However, there is a risk of "degeneracy," where the agent critiques itself into a worse state or gets stuck in a loop of minor, inconsequential edits.
- [03]Increases token usage significantly. Use cheaper models for the drafting phase and stronger, more reasoning-capable models for the critique phase to optimize costs.
> Key Takeaway
Adapt: You are embedding the "Code Review" process directly into the runtime application. The system self-corrects before the human ever sees the output. You must design the "Critique Prompts" as carefully as you write unit tests.
Mission: Build a Self-Improving Code Generator
Create a system where an agent generates code, a critic reviews it, and the generator improves based on feedback — like an automated code review loop.
> Drag blocks to the canvas
Coder Agent
Generates code from a specification
Critic Agent
Reviews code for bugs, style, and correctness
Feedback Loop
Passes critique back to the coder for revision
Max Retries Gate
Stops the loop after N iterations to prevent infinite loops
Router
Routes requests to different handlers
Memory Store
Stores long-term context
Drop blocks here to build your agent pipeline
Arrange them in the correct order
Frequently Asked Questions
When should I use the Reflection pattern?
Reflection (or Self-Correction) enables an agent to critique its own output (or the output of another agent) to identify errors, hallucinations, or areas for improvement, and then iteratively refine the result. It acts as an internal feedback loop, significantly increasing the quality and robustness of the final output.
How does Reflection relate to Unit Testing / Code Review / Test-Driven Development (TDD) Loop?
Both exist to catch errors before final delivery. The "Red-Green-Refactor" loop in TDD is structurally analogous to the "Draft-Critique-Revise" loop in Reflection. It is a quality assurance gate embedded in the development/execution process. However, there is a key divergence: Unit tests check against known, deterministic assertions (e.g., assert x == 5). Reflection checks against qualitative, ambiguous criteria (e.g., "Is this tone professional?", "Is this code secure?", "Does this argument make sense?") using an LLM as the judge. The "test" itself is probabilistic.
What are the production trade-offs of Reflection?
Reflection is expensive in terms of time. It can double or triple latency depending on the number of reflection cycles permitted. It is a trade-off: higher latency for higher accuracy. Highly effective at reducing hallucinations. However, there is a risk of "degeneracy," where the agent critiques itself into a worse state or gets stuck in a loop of minor, inconsequential edits. Increases token usage significantly. Use cheaper models for the drafting phase and stronger, more reasoning-capable models for the critique phase to optimize costs.