You are currently viewing Beyond Merge and Rebase: The Upstream Import Approach in Git

Beyond Merge and Rebase: The Upstream Import Approach in Git

Version 1.4

Git users often have to make a choice: to merge or rebase. I’m going to describe a third way that has the characteristics of both and is very well suited for tracking an open-source project or any other upstream branch.

Merge or Rebase?

Let’s assume that you have forked an upstream open-source repository and keep the fork in your own repo. The default branch of the upstream repository is called main and is called the same in your own fork. You have made a few changes to the source code and committed them to the main branch of your fork. In the meantime, new changes have been committed to the upstream main branch of the project. How do you import the upstream changes to your fork?

Let’s assume that your local fork also contains a branch called upstream/main, which reflects the state of the upstream’s main branch. So the main branch contains your own changes and the upstream/main branch contains the community’s changes:

 time -->

 o---o---o---o---o  upstream/main
      \
       o---o---o  main

So a different way to ask the question is: how do you bring upstream/main‘s changes into main?

One solution is to merge upstream/main into main:

 o---o---o---o---o  upstream/main
      \           \
       o---o---o---M  main

The merge above would certainly work, but it becomes problematic as time passes and you get a lot of these merges in your main branch. You then no longer have visibility into the differences between upstream/main and main, because your commits get lost deep in the history of the branch, as illustrated below:

 o---o---o---o---o---o---o---o---o---o---o  upstream/main
      \           \       \       \       \
       o---o---o---M---o---M---o---M---o---M  main

So the alternative solution is to rebase your main branch on top of upstream/main:

 o---o---o---o---o  upstream/main
                  \
                   o'---o'---o'  main

You now have the advantage of having greater visibility into the differences between upstream/main and main. However, a rebase comes with a different problem: if any user of your fork had the main branch checked out in their local repository and they run git pull, they are going to get an error stating that the local and upstream branches have diverged. They will have to take special steps to recover from the rebase of the main branch.

So how to solve that problem?

The Third Way – Upstream Import

The proposed third way is a special operation that (in the described use case) has the advantages of both a merge and a rebase, without the disadvantages. The approach is illustrated below:

 o---o---o---o---o  upstream/main
      \           \
       \           o'---o'---o' 
        \                     \
         o---o---o-------------W  main

First, the divergent commits from main are rebased on top of upstream/main, but then they are combined back with main using a special merge commit, which has a custom strategy: it replaces the old content of main with the new rebased content. This last commit is the secret sauce of this solution: the commit has two parents, like an ordinary merge, but has the semantics of a rebase. I call this special merge a welding merge (a reader has also suggested the term cauterizing merge). The entire strategy can be called rebase & weld (or rebase & cauterize).

The structure above has the advantages of both a merge and a rebase. On the one hand, just like with an ordinary merge, a user who runs git pull on their local copy of main is not going to see the error about divergent branches. On the other hand, just like with an ordinary rebase, there is visibility into the last imported commit from upstream/main and the differences between that commit and the tip of main.

Dropping Patches

What is supposed to happen if one of the commits from main is ported to upstream/main, as illustrated below?

 o---o---o---A'---o  upstream/main
      \
       \
        \
         A---B---C  main

In that case, the upstream importing operation should drop that patch, as illustrated below:

 o---o---o---A'---o  upstream/main
      \            \
       \            B'---C' 
        \                 \
         A---B---C---------W  main

But how would the upstream importing operation know which patches to drop? There are one of two ways.

Firstly, it can look at the git’s patch-id, which is the SHA of the file changes with line numbers ignored. This is the same strategy that rebase uses to drop duplicate commits.

Secondly, it can use an arbitrary change-id associated with a commit (for example, for projects that use Gerrit, it can be the Gerrit’s Change-Id, which is saved in the commit message). This is useful when a given patch lands upstream in a slightly changed form, but is meant to replace the version in main.

Implementation

The solution above has already been implemented in an open-source Python script called git-upstream, published 10 years ago. It was originally implemented for the OpenStack project, but the solution is generic and applicable to any open-source project. I’ve described how to use git-upstream in another blog post.

It is going to be easier for users to benefit from the ideas behind git-upstream if the functionality is integrated directly into git. Would you like to see the above functionality integrated directly into git?

Alternative Solutions
For completeness, I’m providing links to alternative solutions for tracking patches:
  • git-upstream uses the strategy described above
  • quilt uses patch files saved in a source code repository
  • StGit is inspired by quilt and uses git commits to store patches
  • MQ is also inspired by quilt and implements a patch queue in Mercurial

This Post Has 2 Comments

Leave a Reply