
The AI Code Review Stack Is Converging. Here's Why.
This week, a major infrastructure company published the results of running their internal AI code review system across 5,169 repositories. In 30 days: 131,246 review runs across 48,095 merge requests, median completion time 3 minutes 39 seconds, average cost $1.19, and only 0.6% of reviews needing manual override.
Impressive numbers. But the real signal is not the volume. It is the architecture.
The shape of the new review stack
Read between the lines of every public AI code review release in the last 90 days and a pattern emerges. Independent teams, working without coordination, are converging on the same four design choices:
- Multiple specialized reviewers in parallel instead of one generalist prompt. Security, performance, code quality, documentation, compliance: each handled by a dedicated agent that returns structured findings to a coordinator.
- Risk-tiered orchestration: a Dependabot version bump and an authentication refactor do not deserve the same review pipeline. Trivial changes get a 10-second pass. High-risk changes get the full multi-agent treatment.
- Repository context as a first-class input: not just the diff, not just the changed file, but the architectural conventions, prior review decisions, and project-specific rules baked into something like an
AGENTS.mdor a Knowledge Base. - Local and CI parity: the same review can run on a developer's laptop before they push, or in CI after the PR opens. Same agents, same rules, same output format.
When four independent design choices show up in different teams in the same quarter, that is not a trend. That is the architecture solving a real problem.
Why this stack is winning
The same underlying tension forced everyone toward this shape: AI now writes about 41% of merged code, but a recent survey found AI-generated code takes 91% more reviewer time per PR than human-written code, with three times more readability problems and 75% more logic errors. Volume up, signal-to-noise down. The naive answer (run a bigger model on the diff) makes the noise worse, not better.
The four-part stack solves this differently:
- Parallel specialists keep prompts narrow, so each agent stays accurate within its domain. A security agent that only thinks about authentication boundaries is sharper than a generalist asked to "review this PR."
- Risk tiers let cost scale with stakes. You stop paying $1.50 of compute to review a typo fix.
- Repository context kills the false positives that come from reviewers not knowing your conventions. The reviewer that already understands your authentication module does not flag your custom session pattern as a vulnerability.
- Local + CI parity moves friction earlier. A developer who gets the review at
git pushtime fixes issues in the same context window they wrote them, not three days later when they have moved on.
This is also why the closed, premium options are starting to feel expensive. One major launch this quarter prices its review at $15 to $25 per PR, GitHub-only, Teams or Enterprise tier required. Scale that across a team running 50 PRs a week and you are looking at four to six figures a month for what is, architecturally, a pipeline you could run yourself.
How Octopus Review fits this shape
We did not set out to chase any of this. We set out to build a code reviewer that was open source, self-hostable, and actually understood the codebase it was reviewing. The architecture we ended up with maps cleanly onto the four-part stack the rest of the industry just landed on.
Codebase context as the foundation. Octopus indexes your entire repository into a Qdrant vector store before it reviews a single PR. When a reviewer opens a PR that touches the payment module, the system already knows how that module is structured, what the surrounding authentication looks like, and which conventions the team has been following. Nothing about the diff is reviewed in a vacuum.
Knowledge Base as your AGENTS.md equivalent. Drop your architecture docs, your style guide, your "things we learned the hard way" notes into the Knowledge Base. Every review reads from it. New conventions propagate without retraining anyone:
octopus knowledge add ./docs/architecture.md --title "Architecture Guide"
octopus knowledge add ./docs/security-rules.md --title "Security Rules"
Severity-aware output. Findings come back tagged Critical, Major, Minor, Suggestion, or Tip. This is the same shape as the risk-tier conversation: surface the high-stakes findings immediately, let the low-stakes ones be informational. Reviewers triage, not drown.
Local parity through the CLI. The same review that runs in CI when a PR opens can run from your terminal before you push:
# Index once
octopus repo index
# Review locally before pushing
octopus pr review https://github.com/owner/repo/pull/42
Same engine, same Knowledge Base, same severity output. The friction moves from "wait three days for review feedback" to "fix it before anyone sees it."
What community ownership of this stack looks like
The four-part architecture is going to win regardless of who ships it. The question is whether the most capable version of it sits behind a per-seat SaaS bill or sits in a repo you can fork.
Octopus Review is open source under a Modified MIT license. You can self-host it, point it at your own LLM keys (Claude or OpenAI, your choice), and run it on infrastructure you control. The codebase processes source in memory only. Embeddings persist. Source files do not.
When the architecture converges, the differentiator stops being "do you have multi-agent review." It becomes "who owns the stack." We think that should be the team running the code, not the vendor billing them per developer.
Try it
The reviewer that knows your codebase ships better than the reviewer that just reads the diff. Spin up Octopus Review at octopus-review.ai, star the repo at github.com/octopusreview/octopus, or join the conversation on Discord. The convergence is real. The open-source version is here.