Isolate commit history to specific branch

Question

Say, I have an empty repository with a single initial commit:

* - Initial commit (master)

Then I create a develop branch locally and make a few commits on it. But before I push it to remote (without merging it with master yet) I update master with remote which brings a few commits as well to master branch and I get this:

*   - Update something (master)
| * - Make some changes to the new feature (develop)
| |
* | - Make some changes (master)
| * - Add new feature (develop)
|/         
* - Initial commit (master)

So, my question is how can and should I, after I push develop branch to the remote, how I can merge develop with master branch as a single commit without copying any of the history of develop branch to master, but keep all the history on develop branch.


Show source
| git   2017-01-02 05:01 3 Answers

Answers ( 3 )

  1. 2017-01-02 06:01

    You can create a new single commit on master containing all the changes introduced by develop using get merge --squash, and then committing the results.

    merge --sqauash will apply all the changes from develop to the index, and prepare a commit message containing all the concatenated commit messages from develop, but it will not create a commit. It's up to you to then git commit and edit the prepared commit message.

  2. 2017-01-02 06:01

    Note that the proper workflow would be, since you have not pushed develop yet, to rebase develop on top of master after updating master.

    git checkout master
    git pull
    
    git checkout develop
    git rebase master
    

    Now you can push, since develop content is based on (and validated with) the latest master content.
    You resolve any conflict (from replaying develop on top of master) locally first, then you push.

  3. 2017-01-02 06:01

    The short answer is that you can't, quite—but you may be able to do what you want, and the main problem here is a bunch of ways Git does (and talks about) things that are weird and different, with fiddly technical definitions (which people get wrong too often, which just adds to the confusion).

    Lots of background (sorry, it's kind of long)

    Let me redraw your commits horizontally (which works better for text articles on StackOverflow), and give them one-letter names:

    A--C--E   <-- master
     \
      B--D    <-- develop
    

    That is, A is the initial commit, C and E are the two additional commits that are currently "on" (findable from) branch master, and B and D are the two commits you made that are "on" (findable from) develop.

    Here's a trick question: which branch is commit A on? master, or develop?

    It's a trick question because in Git, it is on both branches.

    Branch names and reachability

    In fact, the names master and develop are nearly irrelevant. What matters are the commits. The names, which I've drawn over towards the right, merely point to one specific commit. We call that one specific commit the tip of the branch, and then we—and Git—work backwards, using the parent information stored in each commit, to find earlier commits.

    These parent links, E back to C back to A, or D to B to A, only go one way: from child, to parent. They're all backwards from the way we might expect at first, and they determine which commits are reachable from any given commit. Starting from C, we can reach C itself (of course) and also A; and that's all. Starting from E, we can reach E, C, and A—and that's the complete contents of the branch, master, since master points to commit E. Starting from D, we can reach D, B, and A; and since develop points to D, that's the complete contents of the branch develop.

    But this means that commit A is on both branches. That's just the way Git is: a commit is on any number of branches—even none, sometimes—and the set of branches is based on reachability, through the parent links in each commit. The names merely serve to get us started in the graph, and there are names other than branch names, such as tag names. Let's put in two new temporary commits, F and G, just for illustration:

         tag: T
            |
            v
            F--G   <-- tempbranch
           /
    A--C--E   <-- master
     \
      B--D    <-- develop
    

    Now commit G is find-able through tempbranch; commit F is find-able by its tag T and through tempbranch, and commits A-C-E-F-G are all on tempbranch. Let's now delete the name tempbranch entirely:

         tag: T
            |
            v
            F--G
           /
    A--C--E   <-- master
     \
      B--D    <-- develop
    

    Now we have an interesting case: commit G is no longer reachable at all. Commit F is still reachable, via the tag T. Git will eventually garbage-collect commit G, making it go away for real. Until then, if you have saved its hash ID somewhere, you can still view it, or even "resurrect" it by giving it a branch or tag name. (Git also keeps these IDs saved in things called reflogs. There's one for each reference—such as branch and tag names—and one for HEAD. These keep commits around for 30 days by default, even after removing their names.)

    If we delete the tag T as well, commit F becomes eligible for garbage collection as well, and eventually we go back to the five commits we had before. Meanwhile, since F and G are not visible, you won't see them: Git will act as if they're not there, unless you dig up their hash IDs somehow.

    Merge commits

    With all that out of the way, let's take a look at regular, ordinary merges. A merge commit, in Git at least, is just like any other commit, with one exception: it has two or more parents, instead of just one. (Most merges just have two, and there's nothing particularly useful about three-or-more-parent merges.)

    Let's look at what happens with a regular merge that merges develop into master, in terms of the commit graph:

    A--C--E--F   <-- master
     \      /
      B--D-´   <-- develop
    

    The name develop continues to point to commit D, but the name master now points to the new merge commit F. Commit F points back to both commits E and D.

    When Git does reachability computations, it follows all the parent links (simultaneously, as it were). So at this point, every commit in the graph is on branch master. Commits A-B-D are (still) on develop, and if you git checkout develop and write a new commit, the new commit will only be on develop, with the name develop automatically moving to point to the new commit:

    A--C--E--F   <-- master
     \      /
      B--D-´--G   <-- develop
    

    This is the normal way to handle these things. The tree (working-tree copy) that goes with commit F incorporates the changes from B and D into master: Git takes commit A vs commit E to find the changes from master, and A vs D to find the changes from develop, and combines them to make F. Then it records both E and D as F's parents, so that the merge remembers which commit was merged.

    Squash "merges" (end of background)

    With all that out of the way, we can now talk about Git's so-called "squash merge". You get these by running git merge --squash, and then doing a bit more. Specifically, you have to also run git commit afterward.

    A squash merge is not actually a merge at all. If we draw the after-effect, we get this graph:

    A--C--E--F   <-- master
     \
      B--D   <-- develop
    

    The contents of commit F are the same as what we would get with a real merge. The new commit goes on master as usual, i.e., master gets moved to point to the new commit. The key difference is that the new commit does not point back to the merged-in commit, but only to the previous tip of master.

    At first, this seems to be just what you want. It does, however, have one very big drawback. Let's add some new commits to develop:

    A--C--E--F   <-- master
     \
      B--D----G--H--I   <-- develop
    

    Now, when we want to incorporate the changes from G-H-I into master, we need to merge again. If we had a normal merge—so that F pointed back to D as well as to E—Git would know how to do this. Without a real merge, though, Git will wind up going all the way back to the common commit A again, and compare A-vs-F, and then A-vs-I, to try to combine the two changes.

    But we already have the B-D changes! They're in F. If we had a real merge, Git would know that, and not have to figure this out.

    If we're lucky—which we often are—Git figures this all out anyway. The longer the chain of commits gets, though, with more and more changes, the less and less likely it becomes that Git will figure this out on its own.

    Hence, the usual rule for squash merges is that we should make them only if we're going to abandon the merged branch, e.g., delete the name develop entirely. Once we delete the name, the unreachable commits eventually get garbage-collected.

    What to do instead, maybe

    When you use a real merge, the drawback is that you see everything, because of the way Git follows both parents simultaneously. That is, if you run:

    git log master
    

    you will see all of A, B, C, D, E, and F (intermixed and sorted by date, normally, though you can control this). However, there's a very important feature to avoid this. You can run:

    git log --first-parent master
    

    instead, to view commits without following any of the "extra" parents from merges. This --first-parent view will show you only commits A-C-E-F, skipping B-D. This is because --first-parent limits the graph traversal to skip the second parents of each merge commit.

◀ Go back