Back to BlogCover image for "What Adding GitLab Support Taught Us About Code Review"

What Adding GitLab Support Taught Us About Code Review

Octopus Team·

The week we shipped GitLab support, an interesting thing happened. The first three users who connected GitLab accounts were all running self-hosted instances behind a VPN. They wanted source code to stay inside their network.

That was not the audience we expected. We thought adding GitLab would open up "the other half" of git hosting. What we actually opened up was a different conversation about what code review means when your code never leaves your infrastructure.

This post is about what we learned shipping the integration, and what almost broke our review pipeline along the way.

The assumption that nearly killed us

When we first scoped GitLab support, the planning doc had one line under "complexity": "should be similar to GitHub". Both platforms have repos, pull requests (called merge requests in GitLab), diffs, inline comments, webhooks. Map the concepts, adapt the API calls, done.

That assumption survived about six hours of actual implementation.

The first thing that broke our mental model was the inline comment API. On GitHub, posting a review comment on a specific line is essentially this:

{
  "path": "src/api/users.ts",
  "line": 47,
  "body": "SQL injection risk here"
}

GitLab does not work that way. To post an inline comment on a merge request, you need a position object that looks like this:

{
  "position": {
    "position_type": "text",
    "base_sha": "5e6dffa28...",
    "head_sha": "f9ce7e16e...",
    "start_sha": "5e6dffa28...",
    "old_path": "src/api/users.ts",
    "new_path": "src/api/users.ts",
    "new_line": 47
  },
  "body": "SQL injection risk here"
}

Three different commit SHAs. The diff version you are commenting against. Old path and new path, even if the file was not renamed. And then the rule that broke our first attempts: if the line is an added line, send only new_line. If it is a removed line, send only old_line. If it is an unchanged context line, send both, and they might not match if earlier changes shifted things.

Get any of those SHAs wrong and you get a vague 400 error that does not tell you which field is the problem.

It took us a day to realise this was not GitLab being awkward. It was GitLab being honest about something GitHub hides: a line number on its own is ambiguous. A comment on "line 47" of a file in a PR is meaningless without knowing which version of the diff you are pointing at. GitHub papers over that with heuristics. GitLab forces you to be explicit. The result is more code, but reviews that stay correctly anchored even when the MR gets force-pushed.

Discussions, not comments

The second surprise was that GitLab does not really have "review comments" as a primary concept. It has discussions: threaded conversations that can be attached to a line, a file, or the MR itself.

This sounds like a small naming difference. It is not. It changes how an AI reviewer should behave.

On GitHub, our reviewer posted flat comments. One finding, one comment. Resolution was a separate concept layered on top.

On GitLab, every finding became a discussion thread that could be resolved or left open. Open threads block the merge by default. That meant our severity calibration had to change. A Tip finding that blocks a merge is not a tip, it is a blocker disguised as a suggestion. We had to think about which severities should auto-create resolved threads versus open ones, and give teams a way to configure that.

The end result was better than what we had on GitHub. The merge gate became a real signal: zero open threads from the AI reviewer means it has nothing left to say. That maps to how a human reviewer thinks more accurately than a list of comments with no state.

The self-hosted question

Once the integration was working against GitLab.com, we expected the next step to be "test against self-hosted instances". What we did not expect was that almost every early GitLab user wanted self-hosted from day one.

These teams were not picking GitLab over GitHub because of features. They were picking it because they could run it entirely inside their own network. Some had hard requirements that no source code could leave their infrastructure.

For an AI code reviewer, that creates an obvious tension. The whole point of the tool is to send code to a language model and get back analysis. If the code cannot leave the network, what is the AI reviewing?

We had built the foundation for this on the GitHub side. Octopus is open source under a modified MIT license, the entire stack runs from a Docker Compose file, and you can point it at any LLM provider you like, including one you host yourself. But GitLab brought a different intensity of expectation. The first self-hosted GitLab user asked us, in their second message, whether they could run the embedding model on their own GPU instead of calling out to an embeddings API.

The answer is yes. We now have users running the full pipeline (GitLab instance, Octopus, embedding model, LLM) on hardware they own, with zero outbound calls during a review. That was always possible, but GitLab support forced us to make sure the documentation reflected it and the defaults pushed people toward the privacy-preserving setup rather than the convenient one.

A typical self-hosted GitLab workflow now looks like this:

# Index the repo (embeddings stay in your own Qdrant)
octopus repo index

# Review an MR (LLM call goes wherever you configure)
octopus pr review https://gitlab.example.com/team/project/-/merge_requests/42

What we would tell ourselves if we started again

Three things, in order of how painful they were to learn.

First, when you integrate with a new platform, the API surface is not the hard part. The hard part is the model of what a review is. GitHub thinks reviews are comments. GitLab thinks reviews are discussions. Those are different mental models and they propagate through everything else.

Second, "we support platform X" is a different promise depending on the audience. For most GitHub users, it means "we work with github.com". For most GitLab users, it means "we work with whatever GitLab instance our security team approves of, on the network they tell us to run on". Those are different products.

Third, an AI code reviewer is only as good as its weakest assumption about the diff. The platform that forces you to be explicit about which version you are commenting against is doing you a favour, even if it costs you a day of debugging.

If you are running GitLab, self-hosted or not, and you want an AI reviewer that can actually run in your environment, give Octopus a try at octopus-review.ai. Source at github.com/octopusreview/octopus, and there is a Discord if you want to compare notes on self-hosting.