Fable 5 Rewrites 50M Lines in a Day. Who Reviews It?

Octopus TeamJune 9, 20264 min read

On this page

Stripe just used Claude Fable 5 to migrate a 50-million-line Ruby codebase in a single day, work that would have taken a full team over two months by hand. The generation side of software got faster again. The review side did not.

The bottleneck moved, and it just got worse

For most of software history, writing code was the slow part. Getting a feature from idea to pull request took days. Review was a rounding error on the total cycle time.

That math is gone. AI agents now open roughly 17 million pull requests a month, up from about 4 million six months earlier. More than one in five code reviews already involve an agent on the authoring side. And the changes are not small. When a model can run autonomously for days, plan across stages, and rewrite millions of lines in one session, the unit of change is no longer a tidy 200-line diff. It is a sprawling, multi-file rewrite that touches systems the author never opened.

Fable 5 makes this concrete. It is the first publicly available Mythos-class model, state-of-the-art on nearly every coding benchmark, and the longer the task runs the bigger its lead. That is genuinely great for shipping. It is also a review problem hiding inside a productivity win.

Here is the uncomfortable part. Generated code looks clean. One security study found that AI-produced code was syntactically correct more than 95% of the time, yet only 55% was secure by default. The other 45% carried known vulnerabilities behind a polished surface. Reviewers report feeling more confident approving agent code, not less. Plausibility is not proof, and bounded human attention cannot prove behavior across a 3,000-line change. Reviewers scan for "looks right" and merge.

"But Fable 5 checks its own work"

Fable 5 can write its own tests and use vision to verify its outputs against a goal. So a fair question: if the model checks itself, why add a separate reviewer?

Because self-checking inside a generation session is not independent review. A model verifying its own days-long output is still reasoning from the same assumptions that produced the code. It does not know that the helper it just refactored is imported by 23 other packages, or that the payment module has an invariant documented in a design doc nobody fed into the prompt. Self-verification catches "does this run." It does not catch "does this fit the system it lives in." Those are different jobs, and the second one needs context the generator never had in view.

Give the reviewer the same context the generator had

This is where Octopus Review changes the shape of the problem. Most review tooling reads the diff and nothing else. It sees the lines that changed and is blind to everything they depend on. For a hand-written 50-line patch, that is usually fine. For an agent rewrite that ripples across a monorepo, it is the difference between catching a regression and shipping one.

Octopus indexes your entire codebase into a vector store and runs review with full project context, not just the patch. When a PR touches the auth layer, the reviewer already knows which services call it, which contracts it has to honor, and where the existing patterns live. It flags findings at five severity levels, from Critical down to Tip, so a real injection risk does not drown in a sea of style nits. The blast radius that a diff hides becomes visible.

It is also bring-your-own-key, which matters more now than it did last week. The model you point at review should be at least as capable as the model that wrote the code. If Fable 5 generated the change, a context-starved reviewer running a weaker model is bringing a knife to a gunfight. With Octopus you choose the model, plug in Claude, and let codebase-aware retrieval do the part that raw model power alone cannot: ground the review in your system.

Reviewing an agent PR looks like this:

# Index the repo once so reviews have full-codebase context
octopus repo index

# Review the agent's PR by URL
octopus pr review https://github.com/owner/repo/pull/42

And a finding comes back grounded in the rest of the codebase, not just the diff:

🔴 **Critical**: Refactor drops tenant scoping on query
`src/billing/invoices.ts:88`
The new helper removes the `orgId` filter that `getInvoices` enforced.
Three callers in `src/api/` rely on this scoping for tenant isolation.

🔵 **Suggestion**: Reuse existing validator
`src/billing/refunds.ts:51`
This duplicates validation already in `src/billing/charges.ts:40`.
Consolidating reduces drift as both paths evolve.

That first finding is exactly what a diff-only pass and a self-checking generator both miss. The code runs. The tests pass. The tenant isolation is quietly broken, and only full-codebase context surfaces it before production does.

The faster the generator, the more the reviewer matters

It is tempting to read every model launch as one more reason to trust the machine and review less. The opposite is true. When one model can compress two months of migration into a day, the scarce resource is no longer typing code. It is the confidence to merge it. That confidence comes from independent, context-aware review, not from the generator grading its own homework.

Fable 5 is a remarkable tool for writing code. Pair it with a reviewer that actually understands your codebase, and you get the speed without inheriting the debt.

Octopus Review is open source and self-hostable, so your source never leaves your infrastructure. Try it at octopus-review.ai, star the repo on GitHub, or come argue about review culture with us on Discord. 🐙

#ai-agents #code-review #llm #rag #context #pull-requests