Monday, March 21, 2016

git-topic: workflow based on topic branches

Topic Branches

The work that I do requires me to contribute to a number of different projects, often across various areas within these projects. Managing patches can become a tedious task. A common workflow for dealing with the patch load is called topic branches.

Topic branches allow you to organize patches into sets that deal with separate... well... topics. A topic is usually a particular new feature you're working on, or it might be a set of patches that cleanup a particular usage pattern. At other times you might want to fix a bug, but doing so requires a set of preparatory patches that need to be applied in a specific sequence. Keeping the set of patches in a separate topic branch makes it easy to ensure the required order. Often a topic branch will result in a patch series that you send to the project's development mailing list.

A bit of background

As an example, as part of my work as a Linux kernel maintainer for Tegra, I work across various subsystems such as graphics, clocks, power management and USB, among others. A lot of the time I don't write patches myself but rather pull in work done by others. Often my own work and that by others isn't quite ready for inclusion into linux-next or mainline, but I keep a development/integration tree that can be used to test what I like to think of as the "bleeding edge".

The idea is that while developers are busy working with subsystem maintainers to get their patches included into mainline or linux-next, users (myself included) can get a peek at the work in progress. This is especially important given the long time it often takes to get patches merged upstream. New hardware can be crippled for the first couple of months after its release by the lack of support in mainline.

A development/integration tree gives people something that they can boot on their hardware and make it work. It also gives people a good indication of what's being worked on and what isn't, so it can help to reduce duplication of work. Furthermore, some developers' work might be blocked by the lack of a specific feature. An integration tree can help them get a head-start.

Topic branches are a good way to model such an integration tree. Individual patch sets can go into separate topic branches. One could of course apply all sets to a single branch, but that would become rather messy very quickly. Topic branches help order patches in a natural way. Integrating all the work into a working tree is often simply a matter of merging together all of the topic branches. Conflicts will result sometimes if two or more of the branches modify the same files (device tree, default configurations, ...), but those are usually trivial to resolve.

Of course one of the dangers of a development/integration tree is that it can devolve into a vendor tree. To prevent that, every contributor needs to make sure that the work will eventually get merged upstream. One additional measure to encourage the flow upstream is to keep rebasing the entire tree. The rebase intervals really depend on the project and the amount of work you're willing to invest. This could be releases, or release candidates. For the Linux kernel I've found linux-next to be a good choice. You'll see later on why it is a good choice.

One difficulty is that if you keep sufficiently many of these topic branches, it can become very tedious to continually merge them together. And if you want to rebase the tree fairly frequently, the amount of time spent running git commands can become significant.

Development/integration tree structure

First, let's take a look at how a development/integration tree can be structured.

The whole tree is built on a common base. That is, each of the topic branches starts with the same commit. This isn't strictly necessary because git is really good when it comes to merging. But having a single base is very convenient for scripting. Starting from the base commit we can create any number of topic branches. A fairly common branch that I happened to create in many projects is called "fixes". This contains cleanups, build fixes and other, mostly trivial, patches. For the Linux kernel I keep other branches, often named after a particular driver or subsystem (pci, xhci, clk, drm/tegra, drm/panel, ...).

All of these branches get merged into a master branch. On top of this master branch, I like to have a work branch that contains various bits that are work in progress, code that's not quite finalized yet, or that I haven't categorized yet.

To keep the repository clean I like to have a common namespace for this tree. All branches get a staging/ prefix. This gives us the following tree-like structure:
  • staging/base
    • staging/clk
    • staging/pci
      • staging/master
        • staging/work
Rebasing the tree onto a new base involves the following steps:
  1. rebase all topic branches onto the new base
  2. make staging/base point at the new base
  3. merge all topic branches into staging/master
  4. rebase staging/work onto staging/master
Publishing the development/integration tree is merely a matter of pushing all the branches to a remote.

Using a frequently updated upstream as a base might be a bit of work, but it comes with a few advantages. On one hand, you get to run your latest code on top of other people's latest code (truly bleeding edge). That is opposed to most vendor trees where the vendor's latest code runs on top of some very old baseline. Running on top of other people's latest code means you'll be able to quickly notice when some API changes and breaks the code in one of your topic branches. You'll be forced to update patches to the latest changes upstream, which prevents the patches from becoming stale. Also, basing on top of bleeding edge code you'll notice any runtime fallout early on, so you can take measures to fix things (in your code or upstream's) before anything even hits mainline or linux-next. Finally a nice side-effect is that once features or patch sets get merged upstream, your topic branches will automatically be flushed during the rebase. Though sometimes automatic here might involve skipping patches manually (git rebase --skip) when git can't figure that out on its own.

git-topic to the rescue

As with many repetitive tasks, scripting is our friend. git is very easy to extend: any script named git-foo and available on the PATH can be run like any git sub-command using git foo. It is also fairly trivial to enable command-line completion using bash (or other shells).

To make it easier for me to maintain my own development trees, I wrote a script that automates most of this work for me: https://github.com/thierryreding/git-topic

git-topic is a custom sub-command that automates the maintenance of a development/integration tree. git-topic keeps a list of topic branches in a file stored in a separate orphan branch (staging/branches). An orphan branch is one which has no common ancestor with the master branch. Keeping the list of branches in a separate branch makes it possible to version the branch list just like any source file. It also becomes trivial to share the topic branch information with others, because the staging/branches branch will be pushed to remotes along with all the other branches. The format of the file is very trivial: it consists of a branch name (not including the staging/ prefix) per line, lines starting with a # are comments and will be ignored.

The following explains how to setup a tree for use with git-topic and goes over the commands used in day-to-day work.

Initializing a tree

The init sub-command initializes the tree using a given base commit:
$ git topic init next/master
This sets up staging/base and staging/master branches that point at next/master (I have next set up as a remote to track linux-next, and master always points at the latest release).

You can use the branches sub-command to edit the list of branches. This uses the EDITOR environment variable to open the branch list file in your favorite editor.
$ git topic branches
Creating branches isn't done automatically, you'll have to do that yourself:
$ git checkout -b staging/foo staging/base

Maintaining the tree 

Once you have all your branches set up, run the rebase sub-command to have everything merged into staging/master and staging/work rebased on top.
$ git topic rebase
If not passed a new base, the rebase sub-command will use staging/base and skip rebasing the individual topic branches.

Whenever you want to rebase the tree, it's as simple as running:
$ git fetch next
$ git topic rebase next/master
This fetches the next remote and then rebases all of the topic branches onto next/master, merges them together and rebases staging/work on top. If next/master hasn't changed, git-topic will notice and skip the rebase of the individual branches.

If git encounters any conflicts during the rebase or merge steps, it will drop you to a shell and let you resolve things. This is one of the areas that could use some work, because it's not entirely trivial to know how to continue after resolving. If rebasing a topic branch fails, you'll be dropped to a shell by git first. You're then supposed to resolve the conflict and git rebase --continue. After the rebase finishes, git-topic will drop you to a shell again to check that all is well. If so, simply exit the sub-shell to continue with the next topic branch. Similarly if the merge fails at some point, you need to resolve the conflict and commit the resolution, just like you would with any other merge. If everything else looks good at that point, exit the sub-shell to continue the merge. The final rebase behaves exactly like the intermediate ones. If an conflict is encountered, you'll need to resolve it and git rebase --continue. In that case you'll still be dropped to a shell by git-topic after staging/work has been rebased, so you need to make sure that you exit that sub-shell, otherwise git-topic won't be able to clean up. You're done if you see this line:
$ git topic rebase next/master
...
git-topic: rebase: all done

Publishing the tree

Making the tree available to others is as simple as running the push sub-command with the target remote:
$ git topic push github

No comments:

Post a Comment