Since 2012, LLVM has relied on its self-hosted Phabricator instance on Google Cloud Platform for code review, but now it's making a transition to GitHub pull requests. In this post, I'll share my perspective on this switch, highlighting GitHub offers significant benefits in some areas while having major drawbacks in the review process.
I may update this article as the process stabilizes further.
Transition to GitHub pull requests
The move to GitHub pull requests has been a topic of discussion over the past few years. Several lengthy threads on the subject have emerged:
- November 2019 Enable Contributions Through Pull-request For LLVM
- January 2020 Phabricator -> GitHub PRs?
- November 2020 Notes from GitHub Pull Requests round table
- August 2021 Phabricator Creator Pulling the Plug
- June 2022 Update on GitHub pull requests
- November 2022 Pull Request Progress Update
This transition could very well be the most contentious infrastructure change in LLVM's history. If you have the patience to delve into the discussions within the mentioned threads, you'll come across numerous remarks critiquing GitHub pull requests as being subpar. I'd like to express my gratitude to Joerg Sonnenberger for sharing a post by Gregory Szorc titled Problems with Pull Requests and How to Fix Them, which made a great analysis comparing several code review tools.
Nevertheless, a decision has been made, and the targeted transition date was set for September 1, 2023. The negotiation during the decision-making process could have been handled more strategically to mitigate some confusion.
On September 1, 2023 (or 2nd on some parts of the world), the pull request lockdown was completely removed.
Accessibility
In general, I believe that the majority of contributors consider GitHub to offer better accessibility. However, I have heard quite a few opinions suggesting that GitHub's code review capability is significantly worse than that of Phabricator. I will delve into this topic further later in this post.
Having contributed patches to more than 200 projects, many of which are one-off and even trivial, I genuinely appreciate it when a project uses GitHub pull requests. This is because I am already familiar with the system. On the other hand, if a project relies on a self-hosted code review website, I might find it less convenient as I'm not particularly keen on registering a username on a website I may never visit again. Even worse, I might need to invest time in getting acquainted with the system if I haven't encountered similar instances before.
The same argument applies to LLVM's self-hosted Phabricator instance. Many contributors have not used Phabricator before and would consider both the website and the command-line tool (https://github.com/phacility/arcanist), to be challenging to use. In general, I believe that GitHub is more contributor-friendly but perhaps not as reviewer-friendly. Within the LLVM community, reviewer resources are extremely limited. Consequently, being reviewer-friendly is likely of critical importance.
Automation
GitHub provides GitHub Apps and GitHub Actions to extend its functionality. With these, we can automate pull request labelling, testing, code analysis, code coverage, and potentially even a merge queue in the future. Some fantastic tooling is available for free and widely used by other open source projects, making it easy to emulate how other projects have set up automation.
Phabricator can also handle automation, but there are far fewer resources available for it. LLVM's self-hosted Phabricator instance, for instance, relies on https://github.com/google/llvm-premerge-checks and Buildkite.
Patch subscription
The llvm-project repository is vast. With a code frequency of 100+ commits every day, it's practically impossible for anyone to monitor every new commit. Nonetheless, many people wish to stay informed about changes to specific components, making patch subscription essential.
One way to achieve this is through mailing lists, such as llvm-commits. This list contains emails about new pull requests, edits, GitHub actions, labelling, resolved issues, and more, making it quite noisy.
The other method is to utilize the code review tool, formerly Phabricator and now GitHub. With Phabricator, users can set up fairly complex subscription rules known as Herald. When a patch title, description, affected files, or the acting user matches certain criteria, you can take actions like adding yourself as a reviewer/subscriber or sending a one-off email.
GitHub, however, is less flexible in this regard. Individual users can choose to watch all pull requests, but they can't do so selectively. Interestingly, users can watch issues with a specific label but not pull requests with a specific label.
To enable component-based subscription, the llvm organization on
GitHub has created multiple pr-subscribers-*
teams, which
users can freely join them. ( Note:
A maintainer is supposed to accept every join request. Unfortunately,
GitHub only displays the "pending request" information on the
https://github.com/orgs/llvm/teams/pr-subscribers-*
page
and not on any other page. It's not reasonable to expect a maintainer to
routinely check the
https://github.com/orgs/llvm/teams/pr-subscribers-*
pages
for pending join requests. So if they miss the email notification that
says would like to join "LLVM"
, the request may remain
pending indefinitely. )
Then we use label-based
subscription. A GitHub action is set up to label every pull
request, and these labels are used to notify
pr-subscribers-*
teams. For example, a pull request
affecting clang/lib/Driver/XX
will receive the labels
clang
and clang:driver
, and the
pr-subscribers-clang
and
pr-subscribers-clang:driver
teams will be notified.
On https://github.com/notifications, the "Team mentioned" tab lists these patches.
.github/CODEOWNERS
Previously, we
added pr-subscribers-*
teams to
.github/CODEOWNERS
. Due to GitHub's CODEOWNERS
mechanism, the pr-subscribers-clang
team would be added as
a reviewer when a pull request affecting clang/xx
was
created. pr-subscribers-clang
members would receive an
email notification about the pull request with a full diff.
However, a complication arose when a member of the
pr-subscribers-*
team approved a change. It resulted in a
message saying
$user approved these changes on behalf of llvm/pr-subscribers-xx
,
which could be misleading if the user did not wish to assume such
authority. In addition, the team was automatically removoed as a team
reviewer, adding to the confusion. This use case wasn't in line with
GitHub's intended functionality, and there was a risk that GitHub's
future changes might disrupt our workflow.
Email filtering
Filtering is another crucial aspect of managing notifications. GitHub supports numerous notification reasons.
I have joined several pr-subscribers-*
teams that I am
interested in as replacements for my Herald rules on Phabricator. I am
still in the process to make a curated list of Gmail filters. Here are
some filters I currently use:
- Apply label "Mention":
from:(notifications@github.com) to:(mention@noreply.github.com)
- Apply label "Team mention":
from:(notifications@github.com) to:(team_mention@noreply.github.com)
- Apply label "Review request":
to:(review_requested@noreply.github.com)
- Mark as read:
to:(push@noreply.github.com)
- Skip Inbox, Apply label "llvm/issues":
from:(notifications@github.com) subject:(llvm-project "Issue #")
- Skip Inbox:
from:(notifications@github.com) (-cc:(review_requested@noreply.github.com OR mention@noreply.github.com OR team_mention@noreply.github.com) "this pull request" KEYWORDS_I_WANT_TO_IGNORE)
However, it's worth noting that these notifications also appear on https://github.com/notifications. I haven't thought hard about the issue of receiving double notifications. Perhaps https://github.com/notifications notifications can be used as a reminders and mute them in batch.
Another option is to switch to a pull-based workflow if LLVM provides a public-inbox instance. https://github.com/pulls provides a dashboard where one can list review requests.
Code review
Code review is the top reason we pick a code review tool. Let's assess how GitHub pull requests fare in addressing this challenge.
Patch evolution
In Phabricator, the body a differential (Phabricator's term for a
patch) contains is a patch file. The patch file is based on a specific
commit, but Phabricator is not required to know the base commit. A
stable identifier,
Differential Revision: https://reviews.llvm.org/Dxxxxx
, in
the commit message connects a patch file to a differential. When you
amend a patch, Phabricator recognizes that the differential has evolved
from patch X to patch Y. The user interface allows for comparisons
between any two revisions associated with a differential. Additionally,
review comments are confidently associated with the source line.
On the other hand, GitHub structures the concept of pull requests around branches and enforces a branch-centric workflow. A pull request centers on the difference (commits) between the base branch and the feature branch. GitHub does not employ a stable identifier for commit tracking. If commits are rebased, reordered, or combined, GitHub can easily become confused.
When you force-push a branch after a rebase, the user interface displays
a line such as "force-pushed the BB branch from X to Y". Clicking
the "compare" button in GitHub presents something like
git diff X..Y
, which includes unrelated commits. Ideally,
GitHub would show the difference between the two patch files, as
Phabricator does, but it only displays the difference between the two
head commits. These unrelated in-between commits might be acceptable for
projects with lower commit frequency but can be challenging for a
project with a code frequency of 100+ commits every day.
The fidelity of preserving inline comments after a force push has always been a weakness. The comments may be presented as "outdated". In the past, there was a notorious "lost inline comment" problem. Nowadays, the situation has improved, but some users still report that inline comments may occasionally become misplaced.
Due to the difficulties in comparing revisions and the lack of confidence in preserving inline comments, some recommendations suggest adopting less flexible and less preferred workflows, which involve only appending new commits and discouraging rebases. This approach sometimes results in a cluttered commit history, with commit messages like "fix typo," "add test," and "fix ci".
In a large repository, avoiding pulling upstream commits may not be realistic due to other commits frequently modifying nearby lines. Some people use a remote branch to save their work. Having to worry about whether a rebase could cause spam makes the branch more difficult to use. When working with both the latest main branch and the pull request branch, switching between branches results in numerous rebuilds. Rebasing follow-up commits could lead to merge conflicts and more pain. In addition, a popular convention among many LLVM contributors is to commit tests before landing the functional change, which also mandates force-push. (The GitHub UI will show a merge conflict.) Sometimes, only through rebasing can one notice that the patch needs adjustments to more code or tests due to its interaction with another landed patch:
- Another landed patch added some tests that would be changed by the current patch.
- Another landed patch added a new use of the function that is renamed by the current patch.
There are two workflows you can do. One is the merge-based workflow recommended by https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork:
1 | git switch pr1 |
You can invoke git log --first-parent
to list the fixup
commits.
Personally I prefer the rebase-based alternative: perform a rebase and force push, then make a functional change and either append a new commit or amend the old commit.
When the functional change is made, leave a comment so that reviewers can locate the useful "compare" button and disregard the "compare" button associated with the rebase push.
Commit message
GitHub's repository setting allows three options for pull requests: "Allow merge commits", "Allow squash merging", and "Allow rebase merging".
The default effect is quite impactful.
- Many projects utilize merge commits when a linear history is preferred, even for just a single commit.
- When "Squash and merge" is selected, the default commit message becomes a concatenation of all commits, often resulting in a message filled with "fix typo," "add test," and "fix ci".
In 2022, GitHub finally introduced an option called "Pull request title and description" for the "Allow squash merging" option. This new option mitigates some problems.
If you land a patch in a manual way, it is easy for the pull request
to end up the "Closed" status (red), even if the commit has the
#xxxxx
identifier. I am not sure whether this status might
discourage people from manually landing patches, which I believe should
be fully supported and not discouraged. [Feature
Request] Mark PR as Merged from Pull Request API
Patch series
I am not well-versed in reviewing patch series on GitHub, but this is a widely acknowledged pain point. Numerous projects are exploring ways to alleviate this issue.
In Phabricator, since a differential consists of a patch file and doesn't have a fixed base branch, we can freely reorder two differentials.
The issue Write a guide for doing "Stacked Reviews" with GitHub Pull Requests tracks the documentation requirement.
User branches to the llvm-project
repository are allowed
since 2023-11-01 to enable stacked pull requests. After a month it was
reported
that some user branches were misused.
Miscellaneous
Pros
- Fetching a pull request is more straightforward.
git fetch origin pull/xx/head:local_branch
. Phabricator allowsarc patch Dxxxxx
and the less-known method:curl -L 'https://reviews.llvm.org/Dxxxxx?download=1' | patch -p1
@mention
applies to every user on GitHub, which is convenient when you need to seek for a user's opinions. You cannot expect every user to have an account on a self-hosted instance.gh
is more powerful thanarcanist
.- There is a per-issue/PR "unsubscribe" button.
Cons
- Fetching a pull request at an old revision can be challenging.
- The narrow display can be somewhat inconvenient.
- Viewing diff and conversations requires switching between tabs.
- Collapsed comments are challenging to locate and often require multiple clicks, makeing it difficult for reviewers to confirm whether a comment has been resolved.
- One cannot comment on a source line a few lines out of the affected lines.
- The inability to search for lines in nearby context when they are not displayed.
- A contributor without write access cannot add reviewers for a pull request.
- A merged pull request cannot be reopened. Let's say the patch has been reverted. In contrast, a closed pull request can be reopened.
- The commit message template does not contain an URI pointing to the pull request. Feature request
Note: the button below the "Files changed" tab allows switching between unified and split diff views. Users who are used to Phabricator's side-by-side diff view may want to adjust this setting.
I've noticed that my review productivity has decreased, a sentiment shared by many others. It's disheartening that alternative code review solutions haven't been thoroughly considered. However, we must move forward and adapt to the new workflow while striving to mitigate the loss in productivity.
I hope that future iterations of GitHub will incorporate some ideas from Pull Request feature requests for GitHub #56635.
I've voiced numerous concerns regarding GitHub pull requests, and for that, I apologize. It's essential to acknowledge that GitHub contributes significantly to open source projects in many positive ways. My intention in sharing these concerns is to express a genuine hope that GitHub pull requests can be enhanced to better support large projects.
I would also like to express my gratitude to Phorge, a community-driven fork of Phabricator, for their dedication and contributions, even as LLVM decided to migrate to the free but proprietary solution provided by GitHub. Phorge's commitment to providing alternatives and nurturing an open-source community for organizations that favor self-hosted solutions is truly commendable.
The thread "How’s it going with pull requests?" has some nice analysis.
I have enabled a browser extension Refined GitHub and adopted octo.nvim.
I am now trying spr: 1
2
3
4
5githubRemoteName = origin
githubRepository = llvm/llvm-project
githubMasterBranch = main
branchPrefix = users/MaskRay/spr/
requireTestPlan = false
Future of reviews.llvm.org
On September 5, 2023, I added a red banner to reviews.llvm.org to discourage new patches.
Transitioning existing differentials to GitHub pull requests could potentially cause disruption. Resurrecting old patches is a practice that people regularly engage in. As per the current schedule, new differentials will be disallowed on October 1, 2023.
It is anticipated that at some point next year reviews.llvm.org
will become a read-only website. To preserve /Dxxxxx
pages (referenced by many commits), we can utilize phab-archive
from Mercurial.
As activities on Phabricator wind down, maintenance should become more lightweight.
In the past two weeks, there have been different IP addresses
crawling /source/llvm-github/
pages. It looks very like a
botnet as adjacent files are visited by IP addresses from very different
autonomous systems. Such a visit will cause Phabricator to spawn a
process like
git log --skip=0 -n 30 --pretty=format:%H:%P 988a16af929ece9453622ea256911cdfdf079d47 -- llvm/lib/Demangle/ItaniumDemangle.cpp
that takes a few seconds (llvm-project is huge). I have redirected some
pages to https://github.com/llvm/llvm-project/.
On September 8, 2023, I resized the database disk to 850GB to fix a full disk issue. Hopefully, we won't need to resize the disk again!
On September 18, 2023, I noticed an aggressive spider that made the website slow. I blocked the user agents for some aggresstive spiders.
On October 19, 2023, we received an attack with malicious using fake
but strange user agents. I blocked two related IP ranges and updated
/etc/php/fpm/pool.d/www.conf
(seemed un-configured before)
to reserve more processes.
On November 23, 2023, SendGrid used by the website stopped sending email notifications. This was related to unidentified spammers.
I am adapting Mercurial's phab-archive project to mirror
reviews.llvm.org/Dxxxx
pages. I just need to make a few
adjustments (python,js,css,html).
The "History" button functions properly, although there isn't a "Compare" button.