Module 07

Quality & Self-Correction

How agents check and improve their own work

TL;DR

How agents check and improve their own work

> Overview

Agents can review their own output, spot errors, and try again. This is called Reflection. You can also have separate evaluator agents that score quality. This module teaches PMs how to define quality bars, design eval-driven development workflows, and balance reflection cycles with latency budgets.

> Why This Matters for Your Product

Without self-correction, agents produce first-draft quality every time. With reflection, they catch hallucinations, fix formatting, and improve accuracy. But each reflection cycle roughly doubles latency. The PM defines good enough for each feature and decides how many correction cycles are worth the wait. This module also introduces eval-driven development: write your evaluation criteria BEFORE you build the feature, just like writing acceptance criteria before development.

> Interactive & tools

Reflection: before vs. after

Without reflection (first draft)

Example: Agent drafts a support reply. Minor tone issue, one factual inaccuracy, and a missing step. User would need to correct it.

After one reflection cycle

Same agent reviews its output, fixes tone and fact, adds the missing step. Quality improves ~15–30%; latency roughly doubles for that step.

Eval-driven development

Write evaluation criteria (what does good output look like?) and 20–50 test cases before building the feature. Use them to measure quality before and after every change. The PM owns the criteria; engineering owns the eval infrastructure.

Related Engineering Patterns

These are the technical patterns your engineering team will implement. Understanding them helps you have better conversations.

ReflectionEvaluation & MonitoringGoal Setting & Monitoring

See the full decision framework

Sign up free to see this module's Key Decisions, the questions to ask your engineering team, and play the interactive Quality & Self-Correction game.

Sign Up Free