- Published on
How to fork: Best practices and guide
- Authors
- Name
- Joaquim Rocha
Fork maintenance — keeping your changes in sync with the latest updates from the original project — can quickly become a mess. Trust me. Over the years, my work did sometimes involve maintaining forks of various open-source projects. That’s not the case with my job now, but when a colleague reached out for help with a fork that hadn’t been rebased in ages, it got me thinking that the steps I follow might be useful for other developers too. Hence this article.
Certainly the topic of fork maintenance can be complex enough that what works for me is not adequate for other developers. At the same time, hopefully the advice in this article makes sense for someone out there.
Before we go into the sections, as kind of an obvious step 0, I recommend anyone to get acquainted with git concepts! I know this sounds so basic that perhaps I could have skipped writing it. But I have seen many developers using git without fully understanding some base concepts. This article assumes knowledge of git without expecting an expert level of it, and focuses on the git CLI. I understand there are many other tools out there to handle git that are maybe more beginner-friendly, and I encourage everyone to use whatever makes their lives easier, but the git CLI is a good way to keep the article also neutral and future-proof.
This article is divided in two main sections: development/workflows best practices during development, and a rebase guide.
Day-to-day Development Tips
This section will mention a few tips that I consider best practices during downstream development that should make the inevitable task of rebasing the fork easier.
Use atomic commits
One of the best things we can do for keeping our git history sane is to use atomic commits. Basically this means that each commits only describes one update (bug fix, feature, config change), and contain only the changes related to that update. Do not confuse this one update with “one file”. This article does a great job of explaining atomic commits and its advantages.
In the context of fork maintenance, atomic commits are even more important. Imagine you have a commit with the title:
Add new hint label to checkout button and update eslint pkg
This immediately hints that this commit is not just doing one thing, but the problems it generates go beyond the breaking of any pedantic sounding principle of good git commit titles. Here is one of the problems that we create for ourselves with such a commit: turns out that upstream has also updated their eslint module version (or even replaced it with something else); so next time we rebase our downstream changes, we will have a conflict! And now we have to edit the commit that adds that hint label (which has nothing to do with the conflict) and fix that conflict. If those two changes (hint label and update eslint) had been separated in their own commit, all we likely needed to do regarding the eslint conflict was to skip adding that commit altogether.
Mastering atomic commits will thus not only help your day to day development like easier code reviews, cleaner history, and other advantages, but it will also save you a lot of time when rebasing your downstream changes.
Identify your fixes and non-fixes
In your downstream changes, you will surely have commits that are about new features, others that are fixes for bugs those features (that were found later on), as well as fixes on the original project’s code (which you should definitely send upstream, but more on that later), and any other changes that may not be related to the ones so far mentioned (config, CI, dependency updates, …).
Since we want to squash any downstream fixes to their culprit commits (more on that later), you will find yourself trying to find which of the commits represent these fixes. But since there may be other, non-downstream related, fix commits that you may still need to maintain, it can save you time if you follow a convention of always identifying such commits. This can be done by using any convention to your like, e.g. a prefix for the commit title such as “fix: ” or “bug:”, or maybe a tag at the bottom of the commit message, like “#upstream-fix”. Same for commits that you may want to send upstream. Running a search for commits that have “#to-propose-upstream” may come in handy if you don’t send any suitable upstream changes right after they are done.
No evil merges
If you have merge commits, make sure those do not introduce any changes. Having evil merges will be a problem later on if you follow my advice on straightening your downstream tree. They also mean that you have commits that do not apply correctly to your tree, and that breaks the atomic commit idea in a different way (you have a commit that depends on another, the merge one, for working out).
Rebase early, rebase often
Obviously a pun, but also a good practice. The longer the period between rebases, the higher the chances of having a very different upstream code base since the last time you rebased. This of course increases the probability of having conflicts.
Depending on the nature of your project (and assuming you rebase on tags — releases —, not on arbitrary commits), you could keep a staging branch that gets frequently rebased, “following up” with all the changes that will be part of upstream’s next release. You could even automate this rebase and add it to CI, running on a daily trigger for example, and be on top of things when rebases fail.
Contribute changes back
Here is the set of changes that are the easiest to maintain in your fork: the ones that are no longer downstream. Contributing changes back to upstream (be it features, bug fixes, etc.) is not just a good way to give back to a project you are using, but it will also very much make your job easier, since from the moment upstream adopts those changes, they are no longer yours to maintain.
Sure. Contributing changes upstream takes its time: upstream maintainers will ask for changes for a few times as PR reviews happen, often for weeks, due to those maintainers lack of time. But it’s still the best you can do for both downstream and upstream projects, and it is the essence of open source at work.
Keep a good relationship with upstream
I could merge this section into the previous one, but I wanted to emphasize how important it is to have a good working relationship with upstream (namely with its maintainers). By a good relationship I mean the kind of trust and dynamics that arise after from positively working with others. Knowing the conventions and guidelines of a project, learning from discussions with maintainers, asking for their input when you have big changes to make (even if those are just meant to be downstream) goes a long way to build the trust that will end up in reviews moving faster, or in giving your input as well when upstream makes changes.
Rebase Guide (sort of)
Now that we have talked about good practices during the day to day development of your downstream project, let’s talk about the actual moment when you take a deep breath and brace for the joy of rebasing your project.
This section title says this is sort of a rebase guide because again, depending on many factors (how many downstream commits you have, did you follow the atomic commits best practice tip, etc.) this may be the guide for you, or you may need to do a lot more or a lot less. In any case, here it goes.
1. Straighten your git history
I am assuming that your git history has merge commits. That’s what I recall GitHub does by default, and that’s how many people like it (and that’s fine). Yet, for the purpose of understanding exactly how many commits we have to handle in our rebase, I like to see the downstream git history as a straight line, as this gives me exactly an idea of how much work I may have.
We can straighten our git history by using the rebase command. Let’s assume that in our downstream project we add our changes to the main
branch. When I start the rebase, I usually create a branch called rebase-1.2.3 (where 1.2.3 is the upstream version we will rebase on) and rebase only the downstream changes.
Assuming we had rebased last time on upstream version 1.2.2
and that we have a tag called v1.2.2 corresponding to the previous upstream version, then we can get a straight git history of our downstream changes with:
git checkout -b rebase-1.2.3 # creates the new straightened branch
git rebase v1.2.2 # get a linear history
After this command, no merge commits will be there. Sure, you lose the ability to understand when those features/fixes were added to your tree, but hopefully the next section will explain how this stops being important.
2. Minimize your downstream changes
At this point, after getting a git history with no merge commits, you have exactly all the changes you have performed downstream, listed in a linear way. Most likely you have some git history that looks analogous to this (though much longer):
11cbc10 bug: Fix another bug in feature A
49943cd Prevent bug in upstream that crashes foo-bar
a1577d0 Add pretty awesome feature B
1495ee6 bug: Fix bug in feature A
5e010b2 ci: Add our own CI
07a4e01 Add super awesome feature A
What this git history (top is newer commits) tells us is that there are essentially only 3 changes you should have downstream: feature A, new CI, feature B. The commit about a bug fix in upstream code should have been sent upstream or already be fixed by upstream at this point (if not, then, granted, we do get this extra commit in our final tree), and the fixes for the bugs in our own downstream features should be squashed onto their culprit commits, i.e. to the commits that introduced them.
Squashing downstream commits
Why should we not keep the fixes for our own bugs? Maybe there is some info that we gain by keeping them? In my opinion, not really. We do gain more by reducing the noise and the burden for the next rebase. Otherwise, you may end up even fixing conflicts on commits whose code gets almost entirely changed by a fix later on. Certainly having to rebase a fraction of the commits is appreciated next time, and any extra info related to those bug fixes is very likely either in code comments, or in the git commit message, and you can and should copy any relevant info to the culprit commit’s own message (same with authors, by using the Co-authored-by tag). Also, if you follow the tip about easily identifiable fixes vs non-fixes, when reading the list of commits in the rebase interactive editing, you will be able to more quickly rearrange them to where the fixes should be squashed.
Some developers will frown upon the idea of changing git history so much, and may even say that this step is not needed. It surely isn’t vital, but they will carry forward eventually dozens of commits fixing things, and that’s more and harder work, while we are trying to do less work.
How to squash commits
Now that you identified some commits that should be squashed with their culprit commits, how to do that? That’s where the interactive part of the git rebase command comes in handy. The git rebase --interactive
command is one of the most powerful features of git, it opens an editor where we can manipulate the commits we are rebasing. It allows not only to squash commits together in a quick way, but allows one to reorder commits, edit them, or even delete them altogether.
So how to use git rebase interactive with the tree examplified above to squash the fix commits? We can basically just directly squash them together:
git rebase --interactive v1.2.2
After the editor opens, you will see something like:
pick 07a4e01 Add super awesome feature A
pick 5e010b2 ci: Add our own CI
pick 1495ee6 bug: Fix bug in feature A
pick a1577d0 Add pretty awesome feature B
pick 49943cd Prevent bug in upstream that crashes foo-bar
pick 11cbc10 bug: Fix another bug in feature A
# Rebasing ....
You have noticed in the editor that the oldest commit comes first and that there are some instructions after these lines. Pick means just use that commit as is, whereas squash means we want to squash with the commit before it (above). So we can edit this text and make it look like:
pick 07a4e01 Add super awesome feature A
squash 1495ee6 bug: Fix bug in feature A
squash 11cbc10 bug: Fix another bug in feature A
pick 5e010b2 ci: Add our own CI
pick a1577d0 Add pretty awesome feature B
pick 49943cd Prevent bug in upstream that crashes foo-bar
After we close the editor, git will attempt to squash the commits together and ask you to edit the commit message which will show both commits’.
Now here is the deal: this may not work! You may have dozens and dozens of fix commits you want to squash together, so you go and cut & paste them, but either you mess up the order, or there’s a conflict, which you still may be able to solve, but many times you will just give up (abort the rebase) and have to start over. There are a couple of strategies we could follow to prevent this (like marking the commits as squash commits for git rebase --autosquash
), but the simplest one is just to do the squashing bit by bit. This means that you follow the same process but only perform one squash, i.e. find the oldest fix, move it to after the commit they will be squashed on and mark it as “squash”, then finish the rebase; now that you have the new tree without that fix commit, squash the next one, and so forth.
3. Keep any upstreameable commits at the beginning
If it the case where you are have downstream changes that will be sent upstream, or that you have already proposed upstream, but haven’t been accepted yet (let’s call these upstreamable commits), then I recommend having those as the first changes in your downstream set. The reason is that you are in fact trying to get rid of those commits by having them eventually merged into the upstream project, and this means that at some point, you will rebase on an upstream version that includes these changes, so by moving these commits to be the first ones in your downstream list, you are preserving the order of the other, purely downstream commits. Moreover, if you happen to have conflicts resulting from moving the upstreamable commits to the beginning, these can be solved once, whereas if you keep these commits in their original order, chances are that when you squash fix commits onto their culprit ones, you may have conflicts in this operation too, if the fix commits were done on code that the upstreamable commit touched.
For moving the upstreamable commits to the beginning of our downstream changes, we run git rebase --interactive v1.2.2
again and move those upstreamable commits to the top (beginning) of the commit list
Using our previous example above, we are talking about moving the “Prevent bug in upstream that crashes foo-bar” to be the first commit in the list.
From this:
pick b4b71b9 Add super awesome feature A
pick 5e010b2 ci: Add our own CI
pick a1577d0 Add pretty awesome feature B
pick 49943cd Prevent bug in upstream that crashes foo-bar
To this:
pick 49943cd Prevent bug in upstream that crashes foo-bar
pick b4b71b9 Add super awesome feature A
pick 5e010b2 ci: Add our own CI
pick a1577d0 Add pretty awesome feature B
Now that we have reduced the number of commits we have downstream and re-ordered our commits to have the upstreamable ones first, let’s do the actual rebase!
4. Effectively rebase your project on the new version
Assuming your new version is v1.2.3 and we are in our straight-main
branch, run the rebase command:
git rebase v1.2.3
Now there are 2 possible outcomes here: 1. it goes super neat and well, and you have a successfully rebased tree; or 2. the rebase stops at a conflict, and your job is not yet finished.
There’s no magic wand for solving the conflicts. You always have to manually check them and decide what the solution should be. I have had cases where I had so many complicated conflicts in one commit, that I preferred to stop there (abort the rebase) and continue later on after checking more indepently (from the rebase) how we should re-apply the conflicting commit. But the problem with aborting a rebase after you solved a few conflicting commits already is that you “lose” the work you did on fixing those.
Partially rebasing your tree
With the situation above, where you fix a few conflicting commits, but then hit a big conflict and want to stop the rebase (to test a different branch for example), you may want to “save” your rebase progress. This may sound like requiring something specific or special, but essentially the partial rebase you are at is just a git tree that happens to be your current HEAD. So you can save your rebase progress by creating a branch for it:
git branch partial-rebase
Now you get all the successfully rebased commits into a branch called partial-rebase
. However, how do you finish the rebase? As far as I know, git rebase
doesn’t allow swapping in some partial trees. Still, git rebase is just a number of sequential cherry-picks! This means that we can check which commits are missing from in the partial-rebase branch from our straight-main branch, and cherry-pick them. You can thus cherry-pick each one of them individually, or cherry-pick a set of commits.
Let’s imagine that our straight-main branch has commits A, B, C, D, E, F (from oldest to newest). We ran git rebase v.1.2.3
and we had to solve conflicts in commits A and B, but when the rebase stopped at commit C we realized the conflicts were too complicated and we need to verify a few things before we proceed, so we save the current rebasing tree by running git branch partial-rebase
. This new branch has now commits A’ and B’ (A and B rebased), and once we understand how to better approach the conflicts for commit C, we want to continue the rebase from that commit, so in this case we move to the partial-rebase branch and cherry-pick from that commit till the end of the straight-main
branch. This can be done by the following commit:
git cherry-pick C^..straight-main
In the command above, C
should be rather considered to be the hash for commit C; adding the ^
means previous (or parent, commit), and it’s a range from there till (..
) the last commit of the straight-main
branch (which can be referred by the branch name or the hash for the F commit in our example).
The cherry-pick command will stop whenever a conflict is found, asking you to fix it, like the rebase command does. At the end of this cherry-picking session, you have your partial-rebase
branch no longer partial but actually it is now your downstream changes, fully rebased on the v1.2.3 from upstream (pat yourself on the back)!
What to do from this step may be again depending on how your project is organized, but assuming you want to keep working on main, then, after some mandatory testing, you shall force push to main, replacing it with the new code that has a new upstream version and some minimized downstream series of commits. You are now ready to continue developing your project, bugging every colleague to keep atomic commits and other tips as top of mind as they send PRs, so next time your job is hopefully easier.
Avoiding (complex) forks as an upstream project
This section is more oriented towards upstream project maintainers/leads than fork maintainers. Forks that keep rebasing on upstream and result in a great relationship between upstream and downstream maintainers are one of my favorite aspects of open source. It may seem like one side is taking advantage of the other, but that’s why I mentioned the “result in a great relationship” bit above, as it can certainly become more symbiotic in the sense of improving the upstream project as a result.
Still, any complex forks will always be difficult to maintain, and any help reducing the issues when rebasing is appreciated. One of the ways this can be achieved, is by having a way to extend the upstream project without modifying its code. This is normally achieved by some sort of plugin system: the upstream project is responsible for keeping the plugin system backwards compatible, allowing downstream developers to keep their changes as part of a plugin, instead of directly changing the source code. This is, of course, the case of very famous projects such as Wordpress, VSCode, GNOME Shell, etc. and is one of the main reasons why we developed Headlamp, a UI for Kubernetes with a focus on extensibility.
If your project is likely to be forked frequently, consider introducing a plugin system. It’s certainly not an easy task, but it’s one where you can (and should) include your downstream users, and it will benefit your project and theirs.
Conclusion
Hopefully this article has shared valuable insights into fork maintenance best practices and reinforced the methods you already use. While this guide may not cover every unique project scenario, I believe it offers useful information for many developers. Thanks for reading.