Tuesday, December 7, 2010

Thou Shalt Not Lie: git rebase, amend, squash, and other lies

As I've used git more, and used more advanced features, my opinions about the merit and uses of certain features have changed. At work we're constantly re-evaluating our git workflow in light of the problems we encounter with the way we're doing things. I may yet re-evaluate these opinions, but rest assured they come not from some theoretical nit-pick, but from real experience.

Rewriting History is a Lie

Using git to rewrite history is a sin. It's called lying. Don't do it. With git there are several forms in which lying comes, but usually you know you're lying if you break the functionality of `git branch --contains some-branch`, `git blame`, `git bisect`, or if you have to use `git push -f origin some-branch`.

Squash merging is a lie

A squash merge (`git merge --squash some-branch`) takes all the commits from a topic branch, combines them into a single commit, and applies that commit. The history now looks as if you had a flash of brilliance, made a ton of changes, and did everything right the first time. It makes you look good, but it's a lie.

Since, you have not actually merged any of the work you did on the topic branch, but instead have merged another totally new commit, `git branch --contains some-branch` cannot tell you that you have merged your branch anywhere. When a QA person or your boss says, "Hey, is some-feature {merged into QA, deployed}" you have to resort to `git log` spelunking.

Also, anytime you are creating a new commit with the same changes as another commit, you are destroying `git blame`'s ability to tell you who to flog publicly. And as we all know, public floggings are the lifeblood of software development teams.

Finally, when you squash merge, the totally awesome rerere feature will not help you re-resolve conflicts. Instead you will have to re-resolve conflicts when you cherry pick the squash commit, or when you squash merge your topic branch to the UA branch or to master. Why? Because rerere depends on being able to detect that you are resolving conflicts between the same SHAs as before, but every time you squash merge it creates a totally new commit. It creates a totally new commit even if you squash merge, `git reset --hard HEAD^`, and then immediately squash merge again.

Rebasing is a lie

Rebasing using `git rebase foo` allows you to rebase your topic branch on foo, instead of whatever it was based on before. This makes it look like you were working from foo the whole time. However, each commit to your topic branch was birthed in a context and by a sequence of events that was unique to that time and that topic branch. You are yanking those commits out of their context and putting them into a totally new context.

Rebasing is effectively a retroactive merge. It is pretending that you "merged" foo into your topic branch 3 days ago, but you didn't. You merged it in today, and you are lying to everyone.

In fact, the commits from your topic branch don't exist anymore, because every single rebased commit gets remade into a totally new commit with a totally different SHA. Pick any one of those rebased commits: Does the commit message make sense anymore? Does the code compile? Do the tests pass?

The answer to these last two questions will tell you whether you can use the totally awesome bisect feature of git. This feature does a binary search through your history to help you discover exactly where some bug or problem was introduced into your code. If you cannot compile and run tests on every commit, then you cannot use `git bisect`, which is a shame.

Being able to compile and run tests for each commit is also useful for resolving conflicts. Say I'm trying to decide whether your change will work with my code, or my change with your code, or maybe some totally different code that "resolves" the conflict, how can I know "resolved" means resolved? The most basic thing I can do (though it's not foolproof) is compile the code and run the tests, but if your code doesn't even compile or run the tests by itself, then I can't intelligently resolve the conflict. Your lie has hurt me and the rest of the team.

Finally, if for some reason you wanted to cherry-pick a commit, but it doesn't compile or run the tests, then cherry-picking it into another branch would break that branch. Thanks for that! Your lie has now become destruction of property, which is a misdemeanor.

Amending a commit is a lie

Amending a commit (`git commit --amend`) to add changes, or to change the commit message is a lie. It breaks `git branch --contains some-branch`. It forces you to do a `git push -f origin some-branch`. It's a sin. See above arguments, enough said.

Selective, retroactive commits are lies

In this last form of lying git is not an accomplice, because you are lying to git. What happens here is that you have been working for a while (20 minutes, 30 minutes, an hour, whatever), and you decide to commit your work, but instead of committing all of your work as you have come to it naturally, you decide to break your work up into several small, "logical" commits. This makes you look good, but it's a lie.

You've changed both "foo.clj" and "bar.clj" and you think there is no dependency between them, so you commit "foo.clj" separately from "bar.clj". The problem is that you don't really know this unless you compile the code and run the tests. Since you don't know for sure that these selective commits can compile and run the tests, you've broken `git bisect`, undermined trust in conflict resolution, and break my stuff if I cherry-pick your commit.

You may think that your commits are "logical", but I would say if you are not committing your work the way that you come to it naturally, then it is not logical. Sure you started solving problem A and then discovered problem B, if they are related and you have to solve problem B in order to solve A, then commit the changes together. They are related so its logical to commit them at the same time.

If you want to solve them separately, then make a WIP commit or a stash, create another branch and solve B, then come back to your original branch and rebase it on B (gasp, see below for explanations of this inconsistency!).

Conclusion

There are several awesome features of git: blame, branch --contains, rerere, and bisect. These features are borked if you lie. If you're going to lie, you may as well use SVN, since these awesome features don't exist there anyway. But don't do that! Let's just all agree not to lie.

Epilogue

Some people may get the impression that I do not think you should ever use `git rebase`, `git commit --amend`, or `git merge --squash`, but that would not be true. These are powerful tools that may be necessary sometimes, but should be used sparingly and judiciously. Perhaps its OK to lie sometimes. If someone comes up to you and says, "How does this make me look?" You choose your words carefully.

One case for history rewriting is integration branches. At work we have integration branches for qa, ua, etc. We have a workflow where we move a single feature through the pipeline kanban style, so we have to be able to merge a feature to an integration branch, rebase it out, etc. In this case no one has an expectation that they should be able to see a clean history for these branches, and no one expects to base their work on one of them.

Other than integration branches, a good rule of thumb is that you should not rewrite history for things that are already push out into the world. This would limit these rewriting tools to uses locally to "fix" things up before pushing them. I still cannot whole-heartedly advise use of rewriting in this case, but if you need to, and the rewriting your doing is limited enough that you're sure the new commits will still compile and pass tests, then fine. Just do it carefully, and don't make it habitual. The End.