Fwiw a branch isn't a named sequence of commits, it's just a label that points to a single commit. It's exactly the same as a tag except that git moves it when you make a new commit.
Mercurial calls these bookmarks which IMO is a much better name because it works exactly like real bookmarks (in books, not browsers). A git branch feels a little bit like a branch of a tree because each commit points to their parent. Therefore, you don't need to track the entire sequence of commits in the branch but just the commit at the tip. I think if they'd had named only this concept better, Git would've been way easier to grok.
I wouldn't usually nitpick like this but given that the entire point of this article is it matters to get the basic concepts right, I figured the author might want to get this basic concept right.
Am I the only one sad that Git beat out Mercurial as the industry standard?
And am I also the only one who still knows how to do things in Mercurial but not how to take the corresponding action in Git? I mean, they're things I haven't had a reason to do in years, but it has some psychological cost to feel like I'm using a system I'm worse at.
I was on the team of a large org doing a POC between the two. Merc did some stuff better, git did more. I am glad git won out overall. I think the only feature Merc had that I cared about was tracking folders as objects.
1) We used some super crappy Clearcase version control system. It was easier to layer Git on top, letting us have a VC on top of a VC. I can't remember if it was impossible, or just easier to do in Git.
2) Conflicts did better. We made kind of torture test of merges and put both systems through. Git was able to resolve more out of the box.
3) Speed of operations, I think Git was a bit faster out of the box.
None of this was about Github, as it was comparing self hosted HG vs self hoted GIT.
Git wasn't really winning until GitHub emerged. The game was still open at that point. Once GH had resoundingly beat the competition for "social coding" (like bitbucket and sourceforge), though, it was all over.
Mike Bayer, creator of SQLAlchemy ORM in Python, wrote about migrating its repository from Mercurial to Git, in 2013: https://www.sqlalchemy.org/blog/2013/05/25/sqlalchemy-migrat...
He had some IMHO strong points in favor of Git itself VS Hg, besides GitHub and its huge userbase, which personally convinced me to switch.
> I really don't understand what it is about Git that seems to prevent anybody from coming up with a sane UX for it.
That's the problem. Git has a sane UX. The only issue is that a lot of people who doesn't understand it, write silly tutorials where they spread their misconception about it. The real issue is the poor quality of tutorials over the internet that mislead a lot of people.
I don't think the UX is that intuitive. If you want to tweak a commit that's not the branch head, you have to rebase the parent of that commit. From a technical standpoint this sort of makes sense, but for a casual user this is not intuitive at all
Here we are. You can't tweak a commit in git. You want to recreate a new one. That's why UX is awful, because there is a permanent confusion about what git does.
I know that rebasing creates a new commit (and all commits thereafter). As I said, from a technical standpoint I get it. But I don't see why there can't be a simple command or button that lets to modify a single commit, and performs the rebase behind the scenes. I commonly need it in my private repos and branches. But right now the rebase operation we're stuck with is very low-level and unintuitive.
But the average casual user just wants to modify the commit. There could easily be a higher level command for this, which internally finds the parent and performs the rebase, but instead Git (and even most GUI tools on top) forces you to go the manual route of selecting the parent and initiating a rebase. It's counterintuitive
There is the ~ or ^ suffix to select a parentage. Also, vin-fugitive selects the parent automatically when you say "rebase starting here" on a commit under the cursor. Not sure what it does for merge commits; there's so great default there.
You still have to go through the "pick"/"reword" flow though. Most of the time people just want to modify a single commit. So why not just have a command for it, instead of needlessly exposing the low-level rebase process to the users
Git commit amend only works if you want to modify thr latest commit. Rebase is unnecessarily complicated if you just want to modify a single commit. It should be something more like `git edit [commit sha]`
That gets complicated. What if there's a merge commit between here and there? How about a conflict? You cannot hide the underlying mechanism of reapplying a set of changes.
On that note, I have written something like this, but it is not something I think belongs in the standard toolbox. It is a tool that applies "formatting" fixes to a topic, preserving history topology. However, it requires some automated tool to be able to perform the edit so that any conflicts can be resolved by running the tool to perform the edit (typically clang-format or autopep8, but sed or something like that is also a candidate). I don't know that requiring your edit to be expressed in terms of sed is suitable for such a "simple" tool to replace rebase.
It would still use rebase in the background. If there's a conflict that happens when reapplying changes, then an error pops up, and the user can look into it and will start to understand the process. But in my experience of just doing quick patches on my private repos (where I do multiple small commits a day so patching an earlier commit is relatively common), a conflict rarely ever happens. I think I've run into it once in my many years of dev.
So for the majority of users, they wouldn't have to even know about rebase until they have quite a bit of experience with git (and programming in general). It's a better learning curve, especially considering Git is now being taught to beginner programmers.
A `git edit` command is not as powerful as `git rebase`, but this is just how UX works. You expose a smaller simpler set of features at the front, and then (if you aren't Apple) allow the users to do more advanced low-level operations if they need to.
I, on the other hand, do something similar, but run into conflicts all the time. Usually when reordering commits or fixing up missed hunks that belong in some older patch. I do conflict resolution dozens of times a week.
I guess we just disagree on how shallow or deep any problems `git edit` tries to hide are in practice and therefore how much convenience it actually saves.
So why is it that every other source control system seems to have such nice tutorials?
We can either posit that the people who made and use Git are substandard, or we can posit that there is something fundamental about Git that makes writing such tutorials difficult.
I like the stage way more than not having it. I think Mercurial has it now as some extension, but my fingers know Git and git-hg is the only way I interact with the tool these days.
Explicit rename tracking is a mistake though. I think Linus got that one right.
Sure but then why do they describe the term wrong in the first paragraph? That's just adding to the confusion they intend to fight. I'm not sure the author is reading this, but think it's worth rewording.
I'm reading it and I think I got it right. The Git community, the Git documentation, and the Git tools all refer to "branches", and I think what they mean when they talk about "branches" is much closer to what I described than to what you did.
For example, consider this very ordinary-sounding phrase that I just picked out of the git-rebase man page:
> If the upstream branch already contains a change you have made …
I don't think your account can explain what is meant by this.
In any case, I didn't make the choice casually, thoughtlessly, or from ignorance.
> In any case, I didn't make the choice casually, thoughtlessly, or from ignorance.
Obviously, and if you got the idea that I thought you did then I apologize for that.
Anyway, the git docs aren't exactly known for getting the nuances of its own concepts right, which adds to the general confusion.
That said, I think I see your point now: it's the same as how a reference to a linked list is actually just the head of the linked list, but you still call it a list and not a head.
I have to admit that maybe I didn't see the forest for the trees.
The problem is that your rebuttal isn't one: you are not addressing any of the points made. And those points are important precisely because branches are not a range of commits which is why they are cumbersome to work with.
But what’s the precise rule defining what constitutes the commits of a branch, after you reach a merge point? Does the branch end there? If not, which parent of the merge does it go? It’s not clear a priori.
I don't think there is a precise rule. because the exact meaning depends on the context. In part 3 of the series, in the section about "branches are fictitious", that's what I will say.
The talk has an abbreviated version: https://perl.plover.com/classes/git-tips/samples/slide018.ht... It's an illustration of just one example of this: If we rebase "branch X" onto commit B, we rebase only the three green commits. But if we rebase it onto C, we consider X to contain the two dark blue commits also.
Are the dark blue commits part of branch X? It depends on what you're trying to communicate.
Right, and that’s what makes “branch” a vague notion in git. It’s not clear what exactly is meant with “branch x” when it’s not just about the labeled tip, and often enough no further clarification is given.
Every commit contains an ordered list of it's parent commits. You could think of the 'main' history as being the chain of commits that are always the first ancestor to each other and this can be a use thing to display when generating a commit history.
However, the history of a commit is always the entire tree of ancestors. This entire tree is the precise definition of what constitutes all the commits of a branch.
Thus while I agree that it is not wrong to view a branch as a bookmark, I also think that attempts to nitpick the man page's use of the term fall flat here. "Commit contained in a branch" just means: "commit contained in the history of the commit the branch currently points to." and isn't ambiguous.
There are realistic uses for more even than that. I’ve worked with a system of pre-production deploying where you labelled your PR/MR/whatever-you-call-it “include-in-beta” or whatever, and the beta branch would be reconstructed to be an octopus merge of the production branch and all include-in-beta heads. (When I came to the process, it was doing sequential merges, which got very slow when you had a whole bunch of work backing up that you couldn’t deploy to production just yet, so I turned it into a single octopus merge which was much, much faster, only falling back to sequential merges if the octopus merge failed, so that it could point out which MR conflicted. I don’t remember how high the numbers ever got, but I think it went beyond 66 at times, even with only a couple of dozen developers.)
Graph log visualisation was certainly fairly difficult to make much of, but it wasn’t generally intended to be made much of, since it was basically a deployment implementation detail.
And when you are playing around with git to try this strategy and it's not allowed you get the wonderful error message "should not be doing an octopus".
(IIRC, from some version of git from about 6 years ago)
I think bookmarks are a fine name but I don't really see quibble about branches, which I think are an even better name, but I know it's because I'm seeing it with my git goggles on.
Can you elaborate on what you think branches are a better reference to? What operations or concepts are better expressed in terms of them? I think I'm at the edge of an epiphany here ... but I'm not quite groking it yet.
> a branch isn't a named sequence of commits, it's just a label that points to a single commit.
True, but in the past majority of cases parent commits in the branch (commits from leaf to the last ancestor that has only one child) would be semantically very related so this understanding is not that bad (so long as you remember a branch is a commit for git operations).
> each commit points to their parent. Therefore, you don't need to track the entire sequence of commits in the branch but just the commit at the tip.
This is not entirely true, because it is perfectly possible to create commit graphs in git in which there is not a single well-defined branch for some commits. So just knowing the branch labels at each tip is not sufficient to assign a single well-defined branch to every commit in the graph.
Mercurial branches don't work like this: they are actually stored as part of each commit, so even with ambiguous commit graphs each commit still has a well-defined branch.
Yeah I much preferred Mercurial from that standpoint. And branches there were correctly named branch. But you kinda, sorta, can get "what should have been called branches but don't exist in Git" by using named tags when you "start a branch" (a real branch I mean, not a Git "branch").
> Mercurial branches don't work like this: they are actually stored as part of each commit, so even with ambiguous commit graphs each commit still has a well-defined branch.
I wonder which operation would not be possible with Git had Git gone that way. And hence I wonder too what's that operation that Git can do and that Mercurial can't.
Honestly switching from Mercurial to Git this weird "branch that aren't branch at all" was the single most WTF feature of Git.
> But if you try to understand the commands without the model, you will suffer, because the commands do not make sense.
I've read this about git several times and certainly felt it. My question is why hasn't anyone come along and fixed it? Git has the plumbing vs. porcelain separation. Why hasn't someone written new porcelain that makes git as intuitive as mercurial, subversion, etc? This seems like a similar situation as dpkg/apt. The underlying design has been settled but we desperately need a better interface.
> Git has an elegant and powerful underlying model based on a few simple concepts
My second question is, if the underlying model is so f*cking elegant, how did it lead to such a confusing interface? This isn't so much me griping about git (although I am) but more a curious case for design theory. Would love to read serious analysis. I can't offhand think of another piece of software where "the model is elegant but the interface is confusing" is such a common critique.
> My question is why hasn't anyone come along and fixed it? Git has the plumbing vs. porcelain separation.
Dozens have, but by the time their replacement porcelain gets anywhere near useful they’ve attained a grasp of the plumbing more than good enough they don’t need it anymore.
> My second question is, if the underlying model is so f*cking elegant, how did it lead to such a confusing interface?
Mastery of structural elegance doesn’t mean you have also mastered interface and experience. In fact it’s often not the case.
> I can't offhand think of another piece of software where "the model is elegant but the interface is confusing" is such a common critique.
Every database. The relational model is a beautiful thing, sql is a horrendous shit-show.
> Every database. The relational model is a beautiful thing, sql is a horrendous shit-show.
Amen.
We probably disagree on if we need less abstraction or better abstraction though.
For all the griping that git gets I really think it could be much worse. It's a somewhat inelegant shrink-wrap over its data structures but at least I can get at all the pieces I need.
I get that SQL works well for ad-hoc analysis and business cases, and a query planner is super useful there, but for service to service calls I'd love a language that let's me refer to the tables, the b-trees, row-ids, and lets me tell it what to do.
> My question is why hasn't anyone come along and fixed it?
Because the model is simple and easy to understand, and you get a lot of power from understanding it. Whereas opinionated porcelain that tries to insulate the user from the model will tend to fail in obnoxious ways and will not serve the user.
Really, there's objects, there's commits, then there's a couple of methods for symbolically naming commits (branches and tags) because humans need symbolic names (because we can't memory SHA-1 hashes, or any hashes). Object and commit hashes function as pointers or inode numbers. Symbolic commit names function as hard links.
If you understand the Unix filesystem, you can understand Git.
Everything else follows from these things.
Merges create new objects referenced from the merge commit.
Rebase is just a script around cherry-pick.
Cherry-pick is just applying a delta from a commit and then re-committing.
Everything is copy-on-write except the symbolic names (branch names, tag names).
There -- I've just told you everything you need to know about the model. And all of that trivially carries over to the UI.
Then create yourself some convenient command aliases. It's not like someone is going to create a layer over git and make it 10x better by fixing some inconsistencies. Plus, whatever benefit is added you'll still have to learn the layer below the abstraction.. eventually.. to be productive. Just not worth it. You may as well learn git. You'll use it for the rest of your career. And that's a breath of fresh air that I can say that confidently about a technology.
`git reset` is analogous to filesystem restore from backup.
Next question.
Note that the analogy to filesystems is not exact. The point is that you can understand higher-level operations in terms of lower level ones / gain insight into higher level operations from knowledge of the lower level concepts. As opposed to the cognitive burden of incomprehensible magic, where every time you get in trouble you've no idea what to do.
You can't; depending on the options that will either stash your staged changes to foo.js, stash your unstaged changes to foo.js, or stash both but flatten them so that when you apply them they're all staged (or all unstaged). There's no way to stash it so that you can unstash it and get it back how it is now.
Yeah, I’ve actually found WIP commits and a hook to block them from being pushed end up working better than stashes, so I haven’t spent a lot of time figuring out the complexities of stashes.
My workflow here would be something like:
git checkout -b save
git commit # save staged changes
git add .
git commit -m WIP # save unstaged changes
git checkout -
> Cherry-pick is just applying a delta from a commit and then re-committing.
That’s trivially not true from its conflict resolution abilities. A cherrypick is a merge which does not create a merge commit, it’s a lot “smarter” that “git show -p | git patch”
> My second question is, if the underlying model is so f*cking elegant, how did it lead to such a confusing interface?
I think this is 66% because the operations are on the tree/nodes themselves, and not like, on "versions", branches, or some other skeumorphic abstraction. It's very bare metal. If you know how the command operates on the tree, you know how to conduct the actual action desired. I think the rest is due to cruft/familiarity - once you build that mental model, it works extremely well. So it's vi-like with a steep learning curve and arcane interface, but once you "git good", it's phenomenally productive.
> Why hasn't someone written new porcelain that makes git as intuitive as mercurial, subversion, etc?
I've used svn and git extensively, but after getting very used to git, i tried working on a project that used mercurial and walked away completely confused by how it works. With "branches" being completely different beasts that i couldn't understand and unsure when it was appropriate to use bookmarks instead. It completely baffles me that people find it better than git's super straightforward dag-of-snapshots model and I reject the idea that it is simpler or more intuitive.
If you're coming from git, this is what you need to do to understand Mercurial: don't worry about Mercurial's branches, ever. Whenever you want to use the term 'branch' from git, translate it to 'bookmark'. Mercurial's branches correspond more directly to svn's branches--they're immutable properties of a revision--but it's not a terribly useful property, and you can forget about them entirely and be perfectly productive.
Although note that unlike git, you don't need to use Mercurial's bookmarks since Mercurial is perfectly happen to let commits sit around without having names pointing to them.
> > But if you try to understand the commands without the model, you will suffer, because the commands do not make sense.
> I've read this about git several times and certainly felt it. My question is why hasn't anyone come along and fixed it? Git has the plumbing vs. porcelain separation. Why hasn't someone written new porcelain that makes git as intuitive as mercurial, subversion, etc?
What about svn is intuitive? It was never intuitive to me. It always felt confusing.
Trying to deal with any system without having a mental model (even if high level) of what's going on will always cause problems. How do I know if `svn update` screws up my uncommitted changes? I never felt safe with svn because I had no idea what the commands could do to my files.
> My second question is, if the underlying model is so f*cking elegant, how did it lead to such a confusing interface? This isn't so much me griping about git (although I am) but more a curious case for design theory. Would love to read serious analysis. I can't offhand think of another piece of software where "the model is elegant but the interface is confusing" is such a common critique.
This is actually pretty common. You just don't notice it when you are used to the interface.
The unix file system and command line tools come to mind.
> Jujutsu is a Git-compatible DVCS. It combines features from Git (data model, speed), Mercurial (anonymous branching, simple CLI free from "the index", revsets, powerful history-rewriting), and Pijul/Darcs (first-class conflicts), with features not found in either of them (working-copy-as-a-commit, undo functionality, automatic rebase, safe replication via rsync, Dropbox, or distributed file system).
> Compatible with Git
>
> Jujutsu has two backends. One of them is a Git backend (the other is a native one 1). This lets you use Jujutsu as an alternative interface to Git. The commits you create will look like regular Git commits. You can always switch back to Git.
Tons of people have tried creating GUIs to make using git easier, and many do, but where they fail is when they try and change the terminology to make it easier to understand.
GitKraken + git cli works like a dreamteam. GitKraken for 80% of the interactions like staging lines, commiting, browsing the history. Git cli for the 20% long tail of special operations
> My second question is, if the underlying model is so f*cking elegant, how did it lead to such a confusing interface?
This is the consequence of elegance. Elegance at one level usually is characterized by being very data-agnostic, very workflow-agnostic, etc. This leads to anything built on top of it to be so generic its unusable or at least very awkward.
Pretty much everyone who works with git daily has a set of script to do the actual day-to-day work. The git command is too awful to use otherwise.
I have a handful of aliases, but scripts? What do you script in git? My aliases revolve around pulling MRs from GitLab/GitHub, having some nice pretty formats and pushing things with custom push options consumed by CI. Not much else there.
It was only recently that I learned GIT doesn't store deltas as it's core functionality, which is how I had presumed it works. It's just saving snapshots, and making up the deltas in retrospect, for purposes of merging, etc.
In using it, you're always confronted with deltas, so it seemed that was how it worked. The command structure didn't correspond at all, and was thus very confusing.
Git is really a contents addressable archive disguised as a version tracking system.
> Git is really a contents addressable archive disguised as a version tracking system.
This is a great take. I think I got lucky that I went from "copying whole directories to save my work before a refractor" directly to git without leaning svn or anything else first.
Treating it as a snapshot management/copy-on-write content addressable storage + diffing utilities is a perfect way of thinking about it. I always wished LVM snapshots had git's CLI.
Yeah, this is how I always introduce git to newbies. “You know how you zip up a copy of your project when you have a version you want to keep? Git is that, except with a bunch of (sometimes slightly wonky) tools to keep track of everything.”
I also try to start them off with gitk/`git gui`, which make it a lot easier to understand what’s going on (“ok, so here you see a diff with the changes you made. Now you stage the file, and it moves down here. You can put it back by unstaging it, but for the moment let’s commit it. Now in the graph you can see your new commit, with a line pointing at the parent, and your branch pointer moved along with it, but the origin/branch pointer stayed behind, so to get that to move we need to push the new commit to the remote...”)
I'm not a fan of this mythos around Git's complexity. It is simply a DAG, a bunch of references to nodes in that graph, and a stack of commands that let you make any modification to that system you so desire.
If you just learn the mapping of those concepts to Git's vocabulary, you really don't need to memorize all the commands. In the odd case, you can just look up the docs. Nodes are commits, references are either branches or labels depending on whether they stick to leaf nodes or any other node. Your working tree is the current state of the repo in your filesystem, and staging is how you assemble commits.
2) Every tutorial and book leans into the horrible vocabulary, and the metaphors they use to "simplify" and "explain" it only make it worse. All of the analogical baggage is far more complicated than a straightforward description of the data structures.
It took an embarrassingly long time for me to understand that HEAD was just a pointer at the branch or commit you were currently working with (I wish somebody would take that "detached HEAD" terminology out back and shoot it), and that a branch was just a tag that, if HEAD is pointing at it, will move to point to a new commit that you make.
That's because "HEAD" and "branch" are stolen terms crowbarred into something quite different than CVS.
edit: the actual Git book from git-scm.com is not bad at all, but most people aren't learning from that.
There's a strong current in all tech things where "I couldn't figure it out intuitively the first time I sat down and tried it" becomes exactly the same as "this is ungodly complex and impossible to use".
I've seen it leveled against emacs, git, vim, etc. It's kind of sad.
I suspect that "attempts to explain something in an easy (but subtly wrong) way" have fueled it with regards to git.
The problem is I've used multiple version control systems before git that were far simpler to use day to day, and fulfilled all of my needs. Be sad man.
A counterpoint: I have an entire department of non-tech users that need to sync and version control documents at work. Not one of them has ever had a problem with SVN pulling and pushing to handle this. Git is completely inaccessible to this group of people, as soon as I tell them it's just a Directed Acyclic Graph with a pointer to a node I rightfully deserved to be laughed out of the room
1. If I've partially staged some changes, why can't I stash and unstash them and get back to the same state?
2. How come stashes are sort of like commits but sort of not? I've managed to get into a state where I had a stash that couldn't be popped because it was in the wrong format - how can I tell when this will or won't happen?
The only thing that makes me competent enough to do most normal day-to-day stuff in git is the built-in git UI in JetBrains IDEs.
It's super intuitive, there's buttons that say exactly what they'll do without me having to worry what underlying commands it's running, and the best part is the merge conflict resolution UI that lets you go file by file and has a 3-pane split for existing changes, merged file, and incoming changes. You can select the arrows that are drawn from one pane into the center to bring those changes into it, X to ignore those changes, and the center is completely interactive like any other text editing area in the IDE so you can just copy paste from each side and fix it yourself.
I basically don't ever bother with git from the command line.
> But if I were going to tell everyone just one more thing, it would be:
> It is very hard to permanently lose work.
Unfortunately, this is not quite true. Git checkout will silently and irretrievably clobber all the changes in your working tree [UPDATE: if you do 'git checkout [path]', but that is not an uncommon thing to do.]
It is true that it is very hard to lose work that you have committed. But even this is not necessarily a good thing if, for example, you accidentally committed something that contains sensitive information that you want to delete. (And God help you if you have pushed such a change upstream.)
The sad fact of the matter is that while the underlying data structures are beautiful and tremendously useful, the UI/UX is a dumpster fire.
It's actually not that hard to put a different UI/UX on top of the core. I'm kind of surprised no one has done this.
Right, so stop typing that at all. Instead use `git checkout -f` to throw away extant changes. The `-f`/`--force` option is a lot clearer in intent than `--hard`.
The GP was talking about "moving the branch pointer", which checkout does not do. I am encouraging you (for the second time) to read the thread you are replying into...
git checkout can also nuke changes from both working copy and staging area. `git checkout <path>` will nuke working copy (and auto-completion is likely to give you a path when you were looking for a branch), `git checkout <ref> <path>` will nuke both. No --force required in either case, so let's not pretend that you can't lose changes without a scary-looking flag like --force or --hard.
Accidentally running a git command in the wrong folder is pretty easy to do at the command line.
I was, at one point, in the habbit of running "git checkout ." after writing some experimental code. Several times a day, or however often. So of course, I once accidentally ran that command in the wrong repo and obliterated some pending changes that hadn't been commited.
I used to `vim test.c` then `rm Alt-.` very very often. Until one day sure enough I'm in the wrong directory, and I actually saw `Alt-.` complete the filename of an important file, but the brain veto latency is slower than the muscle memory twitch and my pinky continued on over to the Enter key.
I blame only Dotan and changed my work habits. Bash had nothing to do with the incident.
But it’s inconsistent in that. git checkout existing_branch complains with “error: Your local changes to the following files would be overwritten by checkout”* and abort, git checkout existing_directory overwrites files.
That’s what makes it easy to make that mistake, and lose data. It’s safe to use until it suddenly isn’t.
I just don't understand why someone would do git checkout existing_directory if they don't want to overwrite files, since that is all the command does.
I dunno - I can tell zsh to are-you-sure me about an `rm -rf *`, git seems like it should be able to do similarly. Or, as Mercurial does, back up overwritten files to .bak.
Agreed though. I always start with teaching people. If you git add + git commit + git branch BACKUP everything you can very safely do anything you want. If you get confused it's trivial to get to a clean state by git reset --hard.
If you give people a way to get to a clean slate it removes a significant amount of the learning curve. In my experience, push/pull/fetch/log/diff are easy enough it's only really merge/rebase management that gets people caught up. Giving people a short fuck up -> undo -> retry cycle lets people learn pretty quick.
> It can do that, if you explicitly run it with the options to checkout a specific path.
Fair point. I've update my comment to clarify.
> But why would you do that if it's not what you want?
Because sometimes it is what you want. The point is, if you happen to do it when it's not what you want, you're hosed. So it is simply not true that "It is very hard to permanently lose work." It is, in fact, quite easy under certain not-uncommon circumstances.
Fair point. There probably should be standard advice along the lines of "if in doubt, make a 'wip' commit before you do anything". That's good advice even beyond using git itself.
IIRC mercurial creates backup (.bak) files when restoring a file. As a result, you can keep changes in .bak files even after (accidentally) doing a `git checkout -- file` in mercurial. Git does not have that feature.
The entire point of that command is to discard local changes.
There is some risk of getting the different forms of checkout mixed up, though, so prefer using switch and restore instead. As far as I know, the only use case of checkout not covered by other commands is checking out a commit that doesn’t have a branch attached to it.
> The entire point of that command is to discard local changes.
I think that's arguable, but it's neither here nor there. Someone who is not well versed in git arcana will be surprised that the level of risk associated with 'git checkout [branch]' is radically different from 'git checkout [directory]'. I think a lot of people will learn this the hard way, especially if they have been told "It is very hard to permanently lose work." It's not true. Permanently losing work is as easy as being careless with git checkout.
> Someone who is not well versed in git arcana will be surprised that the level of risk associated with 'git checkout [branch]' is radically different from 'git checkout [directory]'.
I disagree that it's a difference in the level of risk. There's no "risk" that git checkout [path] will discard your uncommitted changes, it's a promise and a certainty.
That said, I agree that the breadth of largely unrelated functionality smushed into git checkout is surprising, but it's also one of the few things about Git's user interface that actually has been addressed by new subcommands.
I haven't learned the hard way thankfully, but I certainly found it very surprising. It's not at all obvious, and the GUIs with git support call it something like "Undo changes" or "Revert" etc.
Isn't "naming things" one of those "hard things" to do well in computer science?
It seemed to me, linus and the initial git architects just kind of slapped names on operations as features grew organically over top of the model. To early adopters, all of it made sense. Yeah, someone new looking in from the top down will be bewildered.
The worst part is that many of the names were inherited from earlier tools, but usually mangled, or repurposed because the old version was “not needed” (revert) because they’d merged multiple high-level features in the same command out of low-level commonalities (checkout and add being poster children for this)
Not at all is the issue. IIRC svn checkout is git’s clone, git’s checkout combines svn’s switch and revert (and possibly update -r? It’s been a very long time)
Given how widely used git is, we must also empathize as widely as possible. Meaning, users of git are on a spectrum from hardcore engineer with two decades of Unix experience to casual hobbyist.
The casual group may include people very young/old, people with no (significant) background in programming, people that had little formal education in general, non-English speakers, people with cognitive limitations.
When you approach Git like that, neither the commands nor the concepts behind them make any sense.
Saying something like "see, git is just a merkel tree of immutable objects with a unique hash id and a commit is a blah blah..."
...makes no sense whatsoever to the casual group. They are alien words and concepts that are not relatable to the actual task: I just want to version manage something.
>
When I first used Git it drove me almost to tears of rage and frustration. But I did get it under control. I don't love Git, but I use it every day, by choice, and I use it effectively.
> you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.
Anything that ends up in a commit should be recoverable (on the same local git repository, that is), no? The main things that can forever clobber your work afaik is git checkout -- <path>, which will perhaps a little unintuitively just clobber over whatever is not committed.
If I had to pick my version of this solution to fix 60% of git's learning curve problems:
a) Don't tell people about stash. All Commits Everything.
b) All commands that change the working directory or branches should fail on uncommitted work and ask you to git commit. It already does this for untracked changes.
c) Show people show git branch BACKUP (requiring a commit) + git reset --hard BACKUP will always get you back to where you started safely.
I'll try and track this series. I am not willing to invest almost any time in understanding GIT better because with time I just learned what's dangerous and how to avoid it. Worked perfectly for me for more than 10 years with 2 exceptions (and each took half the day to fix; ouch).
I might very well be negatively biased but I'll admit that the whole "GIT's data model is beautiful" makes me roll my eyes every time. I've read through several guides and they have not helped me at all; they even confused me more.
GIT is a tool for managing versions of files. It must be blindingly and painfully simple and all the special cases should be named much better than they are right now.
The fact that they are not is just making me impatiently wait for when GIT's mind-share stealer will come around. Or I dunno, I might just start using Mercurial CLI on top of it and just give up.
> Git has an elegant and powerful underlying model based on a few simple concepts:
Commits are immutable snapshots of the repository
Branches are named sequences of commits
Every object has a unique ID, derived from its content
I would list them as:
Every object has a unique ID, derived from its content
- and every commit id depends on its ancestors' commit ids
Each commit has an associated snapshot (file tree) of the repository
A branch is a named reference to a commit
- the referenced commit changes with addition/rebase/amend of commit(s)
Not quite as compact or elemental but a usably granular mental model.
If you want to learn how git works you can read about it, or merely look at the small text files inside a .git directory.
You can get by quite well w/o really understanding objects, just commits. So I like to start teaching Git concepts by looking at commits and refs first, then add color with objects.
I wonder how many hours have been wasted on fixing mistakes made on git. Recently, I watch a friend spend two days try and figure out what the junior Devs have done with commits and branches
Yeah, that's the tragic thing about git. It's so easy to get the wrong idea or fumble on the easily forgettable garbage CLI semantics and make mistakes. Even experienced people make mistakes if they try do something that's different from their daily grind.
To make it worse, correcting mistakes ALSO is something that's not done regularly-- so that becomes an obstacle as well.
If you really want a deep understanding of git, it's not actually that hard to reimplement a few very basic operations in your language of choice. I'm not saying _everyone_ should do this, but it's a way to level-up your understanding. Write the porcelain for stuff like `cat-file`, `commit-tree`, `read-tree`, `write-tree`. Adding command options aren't really needed, you're not trying to actually reimplement git, you're just trying to write enough so that all of the concepts fit together in your head.
Thing I wish I’d learned sooner about git: it was designed to make patches a first class entity. Trees of objects are indeed the fundamentals, but it’s quite possible to detach a commit by formatting it into a patch (“git format-patch”) and sharing it outside of the context of a repo. No ssh required: email it, fax it, print it out and mail it — all the metadata is retained allowing the commit to be reconstituted as if it were git pushed or pulled over TCP.
Why is this important? Because code review is about reviewing the changes and applying patches in a different context to where they were authored. If I fix a bug in my repo then you should be able to apply my fix cleanly to a repo in a quite different state that mine was in. We can work on different things in a decentralised way and yet still collaborate.
Most people of course don’t work like this. Most people in 2022 will literally share a parent commit from a central repository. They push and pull over the net instead of emailing patches. Their trees are in sync a lot more than the days of patch emailing yore. (Although emailing patches is still used by people who need to exfiltrate a change from some system not blessed with full access credentials to do a real push.)
The thing that does still hang around is the idea of a dissociated patch being the unit of a review. Some code review tools don’t really get this right — Gitlab in particular still muddles up the idea of a merge request and a commit. A merge request has a title and a description but so does a commit and yet they are kept separate. You view an MR’s changes as one big patch and these may or may not bear any resemblance to the git commit or commits and their own titles and descriptions. It is as if GitLab’s authors either purposely or naively ignored the idea that git was designed to think in this way from the ground up, and so they implemented a patch-like thing of their own, on top.
I’ve also never embraced the idea of commits that are for public consumption and those which are not. In my model, there are no feature branches, only branches that combine different sets of patches.
It doesn’t need to be so complex and indeed if you embrace the underlying tool (on which your product is named!) gitlab based git projects would be a bit less special and a bit more like everyone else.
Yeah, it took me some time to get it that the concept of "patch" is actually quite fundamental in many git commands (rebase, cherry pick, merge (it's basically applying patches from common parent and other branch)). I wasn't used to those because as a developer who uses Windows, diff[0] and patch[1] were not popular there.
But I don't mind "Merge Request/Pul Request" that is basically everywhere now (github, gitlab, bitbucket, etc). It's clearly simpler for novice developer to get started with this workflow that with "patch workflow"
Does the same criticism apply equally to GitHub’s pull requests? I feel like they are essentially the same thing layered on top of a repo, gitlab even supports merge requests across forks.
The examples of `git reset` and `git checkout` are being fixed by things like separating them out into `git restore` and `git switch` which helps a lot in my experience. Also, `git status` does a good job of reminding you which commands to run.
I'm pleased the coming Part II purports to point to `reflog`. For a tool that _looks_ like it's fairly advanced, this is one of my favorites to show juniors, because it works so well as a "get me back to where I was" hack.
If you can understand the data structure git uses (Merkel Trees) it becomes easy to understand git (won't help with the CLI though).
I think it can be summarized as Linked List in reverse. New entries are always added on head instead of tail. The pointer used to track the growing list is 'branch' pointer. The pointer that is fixed to a specific node is 'tag'.
Since we are adding on Head, we can have multiple nodes point to one node.
That’s the easy part. It doesn’t tell you how local and remote interact and how the working tree, index and repository interact, and how to handle the issues you can run into with merging, rebasing, etc.
Rebasing is when a whole branch (a chain of commits) is picked up and moved on to some other commit. It's easy to build up on that.
Working tree is current state of files. Index is the staging area or the buffer (if I am not forgetting) where you put your changes you are about to permanently commit.
Git is distributed system. Your repo can point to another repo as its remote copy. Can have multiple remotes. The default remote is 'origin'. When pushed, changes are uploaded to that remote repo. The remote repo doesn't have to be on a server. You can have it locally in another folder. When you push, your changes will be copied to that one too.
Edit: I don't know how merge really works, and merge issues have been difficult to resolve therefore.
I’m not talking about understanding what a rebase is supposed to achieve, I’m talking about the merge conflicts than can arise while rebasing, understanding why they happen and how to deal with them.
I’m not talking about understanding what the working tree and index are, but how they interact under the various commands, what happens with the working tree and index state for example when you switch a branch, etc.
I’m not talking about a basic understanding of what a remote is, but about how to deal with different remotes, how to have local branches, how to push/pull only some branches vs. all branches, and so on.
Those are the things where people get in trouble and into broken states.
> I’m not talking about understanding what a rebase is supposed to achieve, I’m talking about the merge conflicts than can arise while rebasing, understanding why they happen and how to deal with them.
I found it rather obvious? Git usually tells you how to deal with it, and conflict markers are easy to understand. What could be more visible is that rerere exists, as that can save you a lot of energy if you're rebasing frequently. Also, conflicts from stashes are a weird special case, which can be confusing (but so are stashes in general).
> I’m not talking about understanding what the working tree and index are, but how they interact under the various commands, what happens with the working tree and index state for example when you switch a branch, etc.
That's also obvious, and Git tells you what happens as you do this stuff too. If it couldn't switch a branch, it tells you why; if there's an uncommited change, it's listed during checkout. I don't think I'd be able to tell how various commands interact with index without having git in front of me, and yet it doesn't cause me any visible troubles (and I'm a somewhat heavy git user).
> I’m not talking about a basic understanding of what a remote is, but about how to deal with different remotes, how to have local branches, how to push/pull only some branches vs. all branches, and so on.
I don't really understand what's problematic with those. All of the things you mentioned are really basic commands. A good tutorial could easily cover it all.
There's plenty of stuff to criticize about git's UI, but I'm very surprised that this is what you found worth mentioning.
> but how they interact under the various commands
> but about how to deal with different remotes, how to have local branches
If you are saying this in context of CLI. I have no clue. What I am saying is, if you understand the concepts and have a good GUI tool at your disposal you can solve almost all problems with ease.
I have previously used SmartGit which lets you drag and rearrange commits. It also allows interactively selecting the lines/chunks from you want to make part of commit. Same goes for resolving conflicts.
IntelliJ also has a very good GUI which I use these days at work.
Something I didn't find much help with was how to squash commits which include merge commits as well. But overall, GUIs help alot if you know what you want to do.
What I really want to do with Git, but haven't found a way to do, is to compare diffs. Often I have a merged a feature on some main develop, but need to backport these changes to some older release branch. The backport has some merge conflicts, indicating that much has changed on develop since. Now I want to see that the set of changes (the diff) on main develop is basically the same as the set of changes (the diff) on the old release branch. I want to diff their diffs. Anybody any ideas?
The Git docs are very comprehensive and the writing is surprisingly easy to follow.
Also you're totally allowed to set up private repos and run whatever dodgy commands some karma-poor guy on SO claims should sort you out and see what you end up with, or create a "rewritten history" scenario that everyone has always been warning you to avoid (forbidden fruit...).
If you're going to move files around, it's worth experimenting with one file first without doing any change in order to make sure you do it in a way that preserve history.
There is 'git mv' for that. I have found the best way is to create new directories then use 'git mv' on the files.
Screenshots are boring any more, what with this All Things Git. I'll be happy the day we collectively recall there has been TTY for a long time without it.
> Although Mercurial was not selected to manage the Linux kernel sources, it has been adopted by several organizations, including Facebook, the W3C, and Mozilla. Facebook is using the Rust programming language to write Mononoke, a Mercurial server specifically designed to support large multi-project repositories.
Mercurial @ Google:
> Speaking of Google, their Mercurial rollout on the massive Google monorepo
continues. Apparently their users are very pleased with Mercurial - so
much so that they initially thought their user sentiment numbers were wrong
because they were so high! Google's contribution approach with Mercurial is
to upstream as many of their modifications and custom extensions as
possible: they seem to want to run a vanilla Mercurial out-of-the-box as
possible. Their feature contributions so far have been very well received
upstream and they've been contributing a number of performance improvements
as well. Their contributions should translate to a better Mercurial
experience for all.
GitHub is a fine place to stash free code for free. Adding Linus Torvalds' good name to "free storage" is an unbeatable combination for mindshare, but does not change the fact that mercurial is far more ergonomic for the typical cases, and handles facebook/google scale well enough.
Judging by the comments here, there are a lot of frustration. Since git is open sourced and it's a basic tool why aren't the experts here contributing and making it work how everyone here thinks it should work?
I disagree with the premise that I need to know what happens behind the doors to successfully use a tool. I've been using git for the past 15 years with literally 0 issues of merging or branching. The only rule I have to follow is "don't do stupid things with it". Branch -> push -> merge is what I've been doing all my life and it worked like a charm. Just avoid the problems and it's all going to work just fine.
With few exceptions, this mentality is exactly what separates well-paid senior engineers from those who write rest endpoints all day. Knowing your tools is part of the job. Knowing how your tools work is a big part of knowing how to use them effectively.
Senior engineers are paid to get stuff done on time while following good practices, and be proactive and communicative (and not end up on Twitter for the entire day because they couldn't figure out one project requirement by themselves, as many juniors do).
Compared to a junior engineer, they have to get thrice more things right at the same time.
Knowing a tool inside out can help with this, but it's not a requirement at all. It's just one of the, a-hem, tools in their box with which they can achieve efficiency.
I am like many others in this thread. I don't care about GIT's "beautiful data model" one bit. I learned with time which commands are dangerous and I do due diligence to make triple sure I don't trip on them. With 2 exceptions for the last 10+ years (each took me half a day to fix) this has worked perfectly.
It's quite OK to stick to what you know is safe and works well (and what pre-conditions have to be met so it works well).
At this point if a customer comes around and asks me to use advanced GIT features I'll just bill them triple while telling them exactly why.
I deliver, and at least 80% of the time I deliver on time, and 95% of the time with good practices and due diligence attached. How I use GIT is my business. I've been contacted by the next contractors no small amount of times and have eaten praise for how easy to track my project's change / commit history has been, too.
How? I've heard this argument a lot. How does me knowing how to solve a problem I created myself make me better than me avoiding it altogether?
To expand on this: how many engineers you work with know what actually happens in a CPU when they run a for loop, for example? How many of them need to know that? It seems to me that a lot of people are very successful at creating useful things without understanding the underlying tech. Isn't this why we built it?
Mercurial calls these bookmarks which IMO is a much better name because it works exactly like real bookmarks (in books, not browsers). A git branch feels a little bit like a branch of a tree because each commit points to their parent. Therefore, you don't need to track the entire sequence of commits in the branch but just the commit at the tip. I think if they'd had named only this concept better, Git would've been way easier to grok.
I wouldn't usually nitpick like this but given that the entire point of this article is it matters to get the basic concepts right, I figured the author might want to get this basic concept right.