Git users often have to make a choice: to merge or rebase. I’m going to describe a third way that has the characteristics of both and is very well suited for tracking an open-source project or any other upstream branch.
Merge or Rebase?
Let’s assume that you have forked an upstream open-source repository and keep the fork in your own repo. The default branch of the upstream repository is called main
and is called the same in your own fork. You have made a few changes to the source code and committed them to the main
branch of your fork. In the meantime, new changes have been committed to the upstream main
branch of the project. How do you import the upstream changes to your fork?
Let’s assume that your local fork also contains a branch called upstream/main
, which reflects the state of the upstream’s main
branch. So the main
branch contains your own changes and the upstream/main
branch contains the community’s changes:
time --> o---o---o---o---o upstream/main \ o---o---o main
So a different way to ask the question is: how do you bring upstream/main
‘s changes into main
?
One solution is to merge upstream/main
into main
:
o---o---o---o---o upstream/main \ \ o---o---o---M main
The merge above would certainly work, but it becomes problematic as time passes and you get a lot of these merges in your main
branch. You then no longer have visibility into the differences between upstream/main
and main
, because your commits get lost deep in the history of the branch, as illustrated below:
o---o---o---o---o---o---o---o---o---o---o upstream/main \ \ \ \ \ o---o---o---M---o---M---o---M---o---M main
So the alternative solution is to rebase your main
branch on top of upstream/main
:
o---o---o---o---o upstream/main \ o'---o'---o' main
You now have the advantage of having greater visibility into the differences between upstream/main
and main
. However, a rebase comes with a different problem: if any user of your fork had the main
branch checked out in their local repository and they run git pull
, they are going to get an error stating that the local and upstream branches have diverged. They will have to take special steps to recover from the rebase of the main
branch.
So how to solve that problem?
The Third Way – Upstream Import
The proposed third way is a special operation that (in the described use case) has the advantages of both a merge and a rebase, without the disadvantages. The approach is illustrated below:
o---o---o---o---o upstream/main \ \ \ o'---o'---o' \ \ o---o---o-------------W main
First, the divergent commits from main
are rebased on top of upstream/main
, but then they are combined back with main
using a special merge commit, which has a custom strategy: it replaces the old content of main
with the new rebased content. This last commit is the secret sauce of this solution: the commit has two parents, like an ordinary merge, but has the semantics of a rebase. I call this special merge a welding merge (a reader has also suggested the term cauterizing merge). The entire strategy can be called rebase & weld (or rebase & cauterize).
The structure above has the advantages of both a merge and a rebase. On the one hand, just like with an ordinary merge, a user who runs git pull
on their local copy of main
is not going to see the error about divergent branches. On the other hand, just like with an ordinary rebase, there is visibility into the last imported commit from upstream/main
and the differences between that commit and the tip of main
.
Dropping Patches
What is supposed to happen if one of the commits from main
is ported to upstream/main
, as illustrated below?
o---o---o---A'---o upstream/main \ \ \ A---B---C main
In that case, the upstream importing operation should drop that patch, as illustrated below:
o---o---o---A'---o upstream/main \ \ \ B'---C' \ \ A---B---C---------W main
But how would the upstream importing operation know which patches to drop? There are one of two ways.
Firstly, it can look at the git’s patch-id, which is the SHA of the file changes with line numbers ignored. This is the same strategy that rebase uses to drop duplicate commits.
Secondly, it can use an arbitrary change-id associated with a commit (for example, for projects that use Gerrit, it can be the Gerrit’s Change-Id, which is saved in the commit message). This is useful when a given patch lands upstream in a slightly changed form, but is meant to replace the version in main
.
Implementation
The solution above has already been implemented in an open-source Python script called git-upstream, published 10 years ago. It was originally implemented for the OpenStack project, but the solution is generic and applicable to any open-source project. I’ve described how to use git-upstream in another blog post.
It is going to be easier for users to benefit from the ideas behind git-upstream if the functionality is integrated directly into git. Would you like to see the above functionality integrated directly into git?
- git-upstream uses the strategy described above
- quilt uses patch files saved in a source code repository
- StGit is inspired by quilt and uses git commits to store patches
- MQ is also inspired by quilt and implements a patch queue in Mercurial
Pingback: Install git-upstream and Fork a Repository – DevOpsera
Pingback: Tailoring Third-Party Helm Charts: Source-Level Customizations and Effortless Updates – DevOpsera