Git Submodules: Adding, Using, Removing, Updating (2009)

OskarS · on Feb 17, 2021

Here's the lesson I've learned from my experience with submodules in git in several different companies: avoid them like the plague. NEVER USE THEM FOR ANYTHING. Use any other solution available to you: use package managers, use monorepos, use subtrees, just copy/paste the files in, avoid the dependency entirely, do ANYTHING it takes to avoid using submodules.

They just become a constant source of friction. Basically every action you do in git, there's some tiny bit of annoyance caused by submodules, which adds up to tearing your hair out. Like, read this StackOverflow question and answer, and tell me this is something you want to be dealing with on daily basis (and you will be, you'll regularly be dealing with far worse): https://stackoverflow.com/questions/9314365/git-clean-is-not...

The correct way to handle dependencies in general is package managers, but if for some reason that is not available to you, and you wish to avoid git subtree for some reason, copy/pasting the code from the other repository and making a note in the commit message which commit in the dependency repo you copied from is a far preferable solution. Yeah, you lose the history of the sub-repo, but it's well worth it to avoid the complications from using submodule (and you never examine the history of the submodule anyway, how often do you look at the history of your dependencies?).

Submodules: not even once.

temac · on Feb 17, 2021

I agree with that. Both the ergonomics and the quality of implementation of git submodules are currently atrocious. Special mention if you try to mix them with worktrees. I think there is a small possibility that they will become usable in 10 years, and an even smaller in 5.

But for now: don't use git submodule. I did not believed that once, because they kinda look nice on paper. Don't do the same mistake. At least wait 5 years before trying them. And test them appart during at least 6 months, actively each day, before taking a dependency and actually making your project use them.

pvorb · on Feb 17, 2021

What makes you think that submodules will improve? I've used them for the last ten years or so (when I was forced to) and haven't seen any advances in UX.

atoav · on Feb 17, 2021

I agree. In principle the promise of submodules is a very good, logical and clear one – it is totally understandable why anybody would like to use them, because it would just make sense in a lot of cases.

However the way they are implemented is just so horrible, I side with you here: Don't use them for anything serious. It is not worth it. Invest the time into writing a bash script that does the copying for you, use private package managers for shared dependecies or whatever, but don't use submodules – this way you will at least know what bites you when it does.

OskarS · on Feb 17, 2021

Exactly: they're so appealing the first time you find out about the feature. "Oh, i can just go `git submodule add <repo> <path>` and it includes another repo! Very cool!", and then two months later you're staring at the 427th time submodules have caused weird issues when changing branches or whatever, and you just want to hurl your computer into the sea. A hacked-together bash script like you suggested is a far better solution.

GrumpySloth · on Feb 17, 2021

> and you never examine the history of the submodule anyway, how often do you look at the history of your dependencies?

All the time. Software is made of bugs, each of your dependencies has bugs, it's on you to diagnose them and, quite often, fix them. Git history is invaluable in those scenarios.

Generally, I'm in favor of pasting the dependencies in your repo. It protects you against things like the leftpad incident and also keeps things nicely in one place, you don't have to go to multiple places to fetch the full source of your program. It also ensures that you have an accurate view of how complex your creation actually is.

But history of dependencies is also useful. At work, we have one giant, heavily patched third-party dependency and a lot of small ones. The big one lives in a submodule. The small ones are pasted in the repo. There is no end to the bugs in them, history is then very useful. And before you say anything, they aren't some random npm packages not vetted by anyone. They're widely-regarded foundational open source projects.

Package managers on the other hand discourage e.g. modifying the dependencies to debug problems in the application. There are more and less correct ways to do that, but generally it's a hoop to jump. With submodules or vendoring - no problem. They also create an illusion of your project being simpler than it really is and are prone to leftpad. And downloading things at build-time is an anti-pattern.

My perspective on software is this: all software has bugs, now the question is: how do we deal with it? How will we debug it? Sadly, there is a lot of "let's assume this works perfectly all the time and do something else" mindset going around as well. Like... How do you debug a malfunctioning GC in your managed language of choice? Will you be able to debug the dependency you've just pulled into your project? Is this language feature a compile-time nicety that will make the interactive debugging later miserable?

pvorb · on Feb 17, 2021

If you control the dependency and if you're the primary user, just move it into your repository by merging with unrelated histories.

If you're not controlling the dependency, you can still merge the dependency with an unrelated history on every version upgrade. This will still likely consume less time than fighting submodules every day.

junon · on Feb 17, 2021

Disagree. Package managers cause more headache than anything. I have used submodules for almost all of my projects (aside from e.g. Node.js projects). Not once have had issues with them.

I don't know why people raise such a stink about them.

dataflow · on Feb 17, 2021

> I have used submodules for almost all of my projects (aside from e.g. Node.js projects). Not once have had issues with them.

Do you do things with git beside the trivial operations (add/commit/etc.)? Like rebasing/merging and such across commits that introduce or remove submodules? Do you never encounter friction when e.g. removing submodules? Do you never run into the errors people complain about?

P_I_Staker · on Feb 17, 2021

I absolutely despise submodules, but under many use cases you don't add or remove that often. We definitely run into errors people complain about. In fact there are errors that the average user (that doesn't really care about SVC) pretty much can't fix... or with much difficulty. I don't know that I'd still say it's a dealbreaker, but that's definitely very bad. Very much does break some of the spirit of git and what makes it "nice".

We've been using them, and our system is still very usable (obviously this is a ridiculous thing to brag about, but worth pointing out).

I'm wondering if there's a better way to handle shared code between two repos, where the users of either don't have access to the other parent? You might be able to use subtrees? Email source code and just create n separate commits for n projects, with a standardized commit message?

I'd be happy to do something different, but I never had some idea that I could sell in a meeting, and didn't really have the energy or clout to fight this.

sibrahim · on Feb 17, 2021

My preferred solution has been git-subrepo: https://github.com/ingydotnet/git-subrepo

Basically is what (I think) submodules should have been. Creates a vendored copy with metadata about what commit it came from. Normal operations like clone, commit (even touching multiple main/subrepo files which subtree struggles with) are unaffected (normal files from git's point of view).

Pull, push, branching all work as expected on main project (most devs don't even need to know there's a subrepo). If you want to pull/push vendored changes from subrepo, there's a new command, but that's it.

junon · on Feb 17, 2021

This just described every feature that submodules gives you out of the box. I'm curious what value this adds, if any...

sibrahim · on Feb 18, 2021

As Izkata mentions, the superproject has _all_ the files for a given commit without any need for users to have access other repos, additional submodule init commands, etc.

Basically, after a subrepo clone, you've copied the file tree for the subrepo and can make commits on it in the superproject to your heart's content (branching, etc). This is basically a fork/mirror, but adds a single metadata file to track the last upstream commit you pushed/pulled from to allow reconciling later. So with git subrepo, you make commits in the superproject first and can choose (at some later point, if at all) to merge with the subrepo upstream. This is arguably consistent with the git model writ large (make local commits, later choose if/how to integrate those with upstream). Importantly, people that clone your superproject repo don't have to know anything about subrepo or special commands to send changes back to you.

For submodules, changes flow in the other direction. If you want to make a change to the child repo, you must 1) commit in the submodule, 2) push the submodule commit to its upstream, 3) make a commit in the super project that changes the commit the submodule is pointing to. When someone pulls, switches branches, etc in the superproject, they need to do a submodule update with various failure modes or else they end up with empty/out-of-date content.

Subtrees are a bit similar to subrepos, but in practice you still need to be aware of their boundaries since you can't mix subtree/superproject modifications in the same commit. Moreover, you need to use a special merge strategy rather than git default merge/rebase which subrepo uses.

Izkata · on Feb 18, 2021

I can't answer in general, but I do see one possible misunderstanding: "vendored" means the code is copied into the parent repo.

If a submodule's original source goes offline, a new clone of the parent repo won't be able to retrieve it, since it just stores a reference to where to clone from. If subrepo really vendors the code, that'll never be a problem since a full copy is committed directly to the parent repo.

junon · on Feb 18, 2021

This is what mirroring or forking is for...

dataflow · on Feb 17, 2021

> I'd be happy to do something different, but I never had some idea that I could sell in a meeting, and didn't really have the energy or clout to fight this.

Would the alternative I suggested here work? I think you might even be able to do it in tandem with submodules so you try out both and see what works for you:

https://news.ycombinator.com/item?id=26165644

junon · on Feb 17, 2021

> pretty much can't fix... or with much difficulty.

I don't believe this for a second.

junon · on Feb 17, 2021

> Do you do things with git beside the trivial operations (add/commit/etc.)? Like rebasing/merging and such across commits that introduce or remove submodules?

Yes, often (I'm a stickler for clean histories on my projects). The trick is to only work with submodule refs during any sort of complex series of commands, then update the submodules once all of the ref munging is done.

If you have a ref conflict or something (rare IME, unless you're running git commands haphazardly, without understanding them), the error message tells you exactly which ref it couldn't check out and why - in which case, simply cd in and fix it manually.

If you need to move submodules around, `git submodule --deinit` it first, then re-add it.

If you need to remove one, deinit it then make sure it's purged from .gitmodules.

> Do you never encounter friction when e.g. removing submodules?

Nope, though admittedly the commands need some polish here. Worst case, remove the directory in the working tree, remove the entry from .gitmodules, delete .git/module/x/y/z/, then add -A the now-deleted worktree path and your .gitmodules.

> Do you never run into the errors people complain about?

Not since I sat down and read through the concepts section on the official Git docs, which outline what e.g. objects, trees, commits, tags and refs were, no.

Git submodules are up there on the list of the most misunderstood (but entirely useful) features ever, in my opinion.

dataflow · on Feb 18, 2021

> The trick is to only work with submodule refs during any sort of complex series of commands, then update the submodules once all of the ref munging is done.

So you don't actually check out the submodule, ok. But this means you can't actually use the correct submodule commit during the rebase? Like trying to build with it to check if it works before continuing?

> Nope, though admittedly the commands need some polish here. Worst case, remove the directory in the working tree, remove the entry from .gitmodules, delete .git/module/x/y/z/, then add -A the now-deleted worktree path and your .gitmodules.

How do you admit this and yet simultaneously say "I don't know why people raise such a stink about them" and "Not once have had issues with them"? You somehow know there are pain points, you know the various workarounds which can consist of N hoops to jump through, and yet you claim you haven't encountered issues even once and don't even understand what people dislike about submodules? Aren't these clearly contradictory?

> Not since I sat down and read through the concepts section on the official Git docs, which outline what e.g. objects, trees, commits, tags and refs were, no.

I'm pretty sure I know what all of those are and still find submodules painful.

junon · on Feb 18, 2021

> But this means you can't actually use the correct submodule commit during the rebase?

You can update during an interactive rebase if you need to, why not? I've never personally needed this but given how rebases work then an update should be possible if you need to.

> How do you admit this and yet simultaneously say "I don't know why people raise such a stink about them"

Because the lack of a simple removal command does not warrant the "avoid submodules at all costs" sentiment currently the top voted comment on this thread.

> You somehow know there are pain points, you know the various workarounds which can consist of N hoops to jump through,

There are places to improve commands. That's it. And the only hoop I have to jump through is a removal of a stubborn submodule the need arises.

> Aren't these clearly contradictory?

No. The world isn't a dichotomy; both things can be true.

> I'm pretty sure I know what all of those are and still find submodules painful.

A submodule is just a ref, just like a tag. Init and deinit can be finicky, that's about it. If you fully understood refs, then submodules shouldn't be that difficult to reason about.

What, specifically, do you find "painful"?

dataflow · on Feb 18, 2021

> You can update during an interactive rebate if you need to, why not?

You said your "trick" is to explicitly not update the submodules until you've finished the rebase? By definition that means you're not updating the submodules during the rebase...

> Because the lack of a simple removal command does not warrant the "avoid submodules at all costs" sentiment currently the top voted comment on this thread.

It's not just removal, though removal is definitely a big chunk of it. See last paragraph.

> No. The world isn't a dichotomy; both things can be true.

The world isn't a dichotomy, but you "admit the commands need some polish" and then propose a 4-step workaround for issues you've "not once had"? Did you not encounter issues at some point to lead you to propose workarounds and conclude commands need polishing?

> A submodule is just a ref, just like a tag. Init and deinit can be finicky, that's about it. If you fully understood refs, then submodules shouldn't be that difficult to reason about. What, specifically, do you find "painful"?

I don't think you understand how annoying it is for submodules not to add/remove/update seamlessly. Maybe you only use submodules for your own personal projects and rarely modify them so you rarely see the effect. But especially in a team project that's evolving constantly and the submodules aren't frozen, and especially in projects where you have to go mess with your own submodules to figure out if to-be-proposed changes to dependencies will actually do what you want, having to constantly go out of your way to make sure everything is committed and synced properly every single time you check out a new commit gets annoying fast. Especially when you don't even know what commits contain submodule changes to begin with.

junon · on Feb 18, 2021

You're assuming a lot about me so I don't really feel like spending time answering in depth. You seem to hold a strong opinion you're not willing to change, and while that's fine, I don't appreciate being told I'm somehow not qualified to have a differing opinion when that is very clearly not the case.

> You said your "trick" is to explicitly not update the submodules until you've finished the rebase? By definition that means you're not updating the submodules during the rebase...

A "trick", not a rule. If you need to check submodules during an interactive rebase, fine, do it. Nothing is stopping you, and it should work fine.

> The world isn't a dichotomy, but you "admit the commands need some polish" and then propose a 4-step workaround for issues you've "not once had"? Did you not encounter issues at some point to lead you to propose workarounds and conclude commands need polishing?

For 1 issue, of which I feel could definitely have a better command. You're arguing semantics at this point. I've not had issues where I'm so stuck they're impossible to use or to fix, like so many others here claim they do.

> I don't think you understand how annoying it is for submodules not to add/remove/update seamlessly.

I don't think you understand how submodules work, at all.

> Maybe you only use submodules for your own personal projects and rarely modify them so you rarely see the effect.

You assume incorrectly.

> But especially in a team project . . . and . . . projects where you have to go mess with your own submodules . . . having to constantly go out of your way to make sure everything is committed and synced . . . single time you check out a new commit

You're using submodules incorrectly, then. Learn what worktrees are if you don't. Learn what directory remotes are. Learn how to use multiple remotes, even.

None of what you're saying is a result of bad Git design. It's a result of you not knowing how Git works, or how to use it effectively. Git gives you all of the tools to use submodules effectively (e.g. testing changes without commiting, or testing changes directly in a submodule, etc.)

Do not patronize me because I RTFM.

dataflow · on Feb 18, 2021

Wow. I'm not "assuming" anything about you, I'm going literally off what you wrote ("I'm a stickler for clean histories on my projects" when I asked if you use git non trivially...) and even then I said maybe this is why, and even then its truth or falsehood had absolutely zero bearing on the correctness of my points. And that was literally just 1 thing, not "a lot". For someone trying to be accurate you exaggerate like there's no tomorrow, and that's been part of the very issue from the very beginning of the discussion.

Meanwhile the person who's been patronizing me and everybody else this entire time is you. Pretending like you're the only one who understands git and me and the rest of the world doesn't. You both go out of your way to trash my understanding of the entirety of git (not only do I not understand how submodules work but I don't understand remotes either? seriously?) with absolutely zero basis and also get offended that I said maybe you're simply not using git in the same kinds of projects that others do and hence that might be why you're not realizing why the problems are such a big deal to other people you've been looking down on before I even started replying? Your entire thesis is always that nobody but you has read the manuals or understood git, and yet I'm the one with a strong opinion unwilling to change? Really?

Yes, obviously I don't understand anything in git and the problem is in fact that nobody understands anything about git except you; it can't be any other way.

petepete · on Feb 17, 2021

This is my experience too, I wouldn't recommend people use them unless they really understand the costs. The last time I used them was to help manage my dotfiles, so I could use them with Pathogen to pull in my selection of vim plugins. It was just a headache, to the point where I bit the bullet and went down the package manager (vim-plug) route instead.

Anthony-G · on Feb 17, 2021

This is also the only use-case I have for git sub-modules. However, I haven’t encountered any serious problems. I mostly leave my (few) Vim plugins alone for so long that I forget the git commands to update the plugins when I do decide to update them. Having to check my notes or the relevant section of the Git Book is the worst of the friction.

I’m using Pathogen as it works fine for me. I believe Vim 8.0 introduced something similar but I still have to use older Vim releases on various CentOS 6 and 7 systems.

user-the-name · on Feb 17, 2021

I don't understand why submodules are so bad. They are a feature where maybe 60% is implemented, and then just abandoned.

Mercurial has subrepos, which are the EXACT same feature, but they are actually fully implemented, they work, and they don't make you shoot yourself in the foot constantly. They are mostly a joy to use.

jayd16 · on Feb 17, 2021

Submodules used to be terrible to use but now that every tool supports it they are usable.

A lot of dependency managers are moving to supporting git repos which I think is great. An annoying thing about this though it is causes several round trips to origin that can be saved with submodules. It's never easy...

Its better to treat submodules as src dependencies and not wannabe mono-repo tool. Try not to change them. If you read-only they're somewhat painless.

dataflow · on Feb 17, 2021

Yeah, I agree. At some point in time I came up with a clear rule on when exactly submodules are the best solution, and it was such a narrow set of circumstances that I forget what it was at this point. I think it might have been in situations similar to Boost, where the project is actually modular? I forget. But even in those cases the friction is painful. (For the life of me I don't understand why they don't try to polish all the rough edges. Anyone know?)

One alternative solution I like: have a build tool that pulls dependencies in automatically. Bazel supports this natively, but I've also done this in plain ol' Make. Like you can build download rules for deps/github.com/% to automatically pull from GitHub, and the rest of the script can just treat them like local files. Having the build tool take care of it often seems to fit the problem space better too.

coryrc · on Feb 17, 2021

Your later half is essentially what subtree does, but it can also merge changes if you have any.

dataflow · on Feb 17, 2021

Wait really? I thought subtree stores a copy of the other repo?

coryrc · on Feb 18, 2021

Maybe I misunderstood and you aren't checking in those files, in which case it isn't the same.

m_eiman · on Feb 17, 2021

Don't use subtrees, they're even worse. At least when we tried using them for a while before moving to submodules; perhaps they're less of a broken hack these days.

For example, whenever we did something with the subtree it reprocessed all commits in the repo - obviously taking longer and longer as time went on.

globular-toast · on Feb 17, 2021

Submodules have been working very well for my emacs config for years now. However, I've used them for software projects and they are indeed awful. I joined a company where the lead engineer had set up a build system that was to be included as a submodule in every repository. It was a nightmare.

whatever1 · on Feb 17, 2021

I have same experience with you, when I tried to deal with a c++ library that was co-developed with the main project.

In lieu of an ubiquitous package manager / module support what other options are there for c++? Keep a text file in the main project with the version of the library you need to compile it ?

60secz · on Feb 17, 2021

Agree 100%. Submodules should be instead be implemented through auto-published library semver jars which are auto-incremented.

coding123 · on Feb 17, 2021

SAME exact experience here. If you can create either a build workflow or some other method of managing a relationship like this, do that instead of submodules.

I've been on teams that used it quite excessively and it was a PITA. I've used it in much smaller ways to try to avoid a build step or two and it worked until it was responsible for the world falling and no one knew it was that damn sub module.

Just because the tech exists, doesn't mean it's a good idea.

pedrolamarao · on Feb 17, 2021

In my experience, such things tend to be misused when dependency management is not available. There are many tools today which sell themselves as "dependency managers" which are in fact just glorified downloaders. Such things should be called "download managers" instead.

gmadsen · on Feb 17, 2021

This is beyond dramatic. To actually suggest copy/paste is preferable is asinine.

Once you get used to them, they are fine. I never have issues with submodules and I get to easily change revisions of dependencies.

flohofwoe · on Feb 17, 2021

IME submodules are mostly fine for external dependencies that only very rarely change and remain pinned at a specific version over weeks or months.

Giving submodules a special set of git commands, and providing the users so many easy opportunities to shoot themselves in the foot are baffling design decisions though.

E.g. I'd really like to know why "git clone --recursive" isn't the default.

maweki · on Feb 17, 2021

And then you always forget what you need to do after you didn't do it for months. I have a blog project (written with pelican) where I submodule the whole pelican-plugins project which is, basically, again a compendium of submodules.

Now when running Python creates pycache-folders in the submodules and other stuff in the directories sometimes changes a bit while running.

When it comes to updating the plugins-submodule, I have to go through every sub-submodule and git clean it and reset it so that I can pin a new version for the submodule, and then update all sub-submodules. Mind you, I have not found it easier to delete the whole submodule-tree and re-initialise it.

angrais · on Feb 17, 2021

Wouldn't this be resolved by a good README.md?

maweki · on Feb 17, 2021

Of course, the correct way would be to write down all the steps, operationalise them, and then implements parts and then the whole as a shell script.

The question I would ask is, whether this should be necessary to pin the version of some project dependency? That's basically all we're talking about here.

Pxtl · on Feb 17, 2021

Code is the best documentation. update-submodules-latest.sh (well, PS1 for my team, we're a .net shop) to move to latest along their branch and stage the submodule change

c03 · on Feb 17, 2021

Both --recursive not being default and then using --recurse-submodules when pulling are my top annoyances in git.

junon · on Feb 17, 2021

Another tip: you can do a deep clone of the main repo, then do `git submodule update --init --recursive --depth=1` to just pull the heads of your dependencies to cut down on times. Especially useful if you don't work within the submodules at all.

nerdponx · on Feb 17, 2021

IMO this should be the default!

junon · on Feb 17, 2021

No. History is the whole point of git. It would be antithetical to make this the default.

I'm more on board with the recursive options, though.

nonbirithm · on Feb 17, 2021

I can't for the life of me understand why this isn't at least a config option. That and an option for setting --upstream in git push by default. Those could at least preserve backwards compatibility.

Are the Git maintainers so stringent on maintaining backwards compatibility that the implementation of submodules has remained in such a dire state for so long? I find it hard to believe that a VCS this widely adopted would fall over so easily for this relatively popular use case.

jayd16 · on Feb 17, 2021

You can now set it for pull but not for clone.

    git config --global submodule.recurse true

jpxw · on Feb 17, 2021

At the very least, it could warn you when you pull a repo with submodules without —recurse-submodules. But it doesn’t, it just silently does nothing.

nemetroid · on Feb 17, 2021

I’ve found that setting submodule.recurse to true makes submodules behave closer to how you’d expect them to.

user-the-name · on Feb 17, 2021

Mercurial's subrepos implement the exact same functionality as git submodules, but they do it without being as bafflingly broken as git's attempt.

NalNezumi · on Feb 17, 2021

Oh I remember at previous work when the (now lead engineer) decided that the main repo that depended on actively changing 4 subrepo currently included as subtrees had to be changed to submodule. This in a rapidly developing repo where the main branch hadn't been update for 18 months while "develop" branch was switched to other "sub" branches on monthly basis.

Needless to say it was a total shitshow. My first experience of how "wanting to use the cool stuff I just found out about" type of perso in position of decision making can be catastrophic. The dude wanted to switch all code base to C++20 in 2020, 3 days after the conference

BuckarooBanzay · on Feb 17, 2021

Despite all the comments saying to avoid submodules i can only recommend them.

I'm using them in several of my game servers as a "meta-repo" that points to other git repositories (for example here: https://github.com/pandorabox-io/pandorabox-mods)

It makes updating, finding/fixing bugs and testing much easier (we are using github's dependabot to update and kick off initial tests)

pornel · on Feb 17, 2021

What they achieve is very useful. How they do it is terrible.

I use them too, but their "user interface" is half-assed and full of traps. I have to have a bunch of extra scripts to check and fix them, and a ton of warnings in the README for poor souls who haven't experienced the pain yet.

nerdponx · on Feb 17, 2021

"user interface" is half-assed and full of traps

Welcome to Git.

And least it's a lot better than it used to be.

JamesSwift · on Feb 17, 2021

Yes, I think as a way of vendoring external dependencies they are ok. That's about the only use case that makes sense to me. The friction they cause isn't worth it otherwise.

wheybags · on Feb 17, 2021

I've found that git subtrees are a much better option. Good intro here:https://www.atlassian.com/git/tutorials/git-subtree

They're like submodules, but you can edit them and they're embedded in your history, not just a reference.

rkangel · on Feb 17, 2021

For those who aren't familiar with them, subtrees and submodules are two different approaches to the same problem - wanting the contents of another git repo available to you. To understand the differences, imagine two solutions you might hack together:

Have your build script do a 'git clone' of a specific repo into your folder structure, so that your build can reference the files. You can imagine some scripts you might write to simplify it, and you can also imagine doing it on a git hook so it happen automatically on checkout. This is git submodules.

Alternatively, you could just take that other repo, copy all the files into a subfolder of your repo and check them in. When you want to update it, you need to do a bit of manual fiddling and then check in again. This is git subtrees (with commands to help you more easily do the updates).

I think people create repos when they shouldn't anyway - I subscribe to a more monorepo approach with lots of things in one repo, and git subtrees are an obvious match to that.

IshKebab · on Feb 17, 2021

So git subtree is just a monorepo basically?

rkangel · on Feb 17, 2021

It's a monorepo, where some of the contents are a copy of another existing repo.

I'm sure we've all 'vendored in' a dependency before - just copying in the files and checking them in. The fiddly bit is when you then want to update that vendored dependency to the newer version (more manual copying of files and checking in), and it gets tedious and error prone when you have 10 of them. Subtrees do the same thing, but provide tooling to make it a lot easier.

Phlogistique · on Feb 17, 2021

Yes - or rather, it's a tool that helps you build a monorepo from separate repos.

asplake · on Feb 17, 2021

I use a subtree to maintain an open repo within a private one. It works, but why is it so slow? You change just one file and before attempting a push it seems to scan the whole thing. I tried this recently without a network connection, pretty clear that the slow bit is entirely local.

vaughan · on Feb 17, 2021

https://stackoverflow.com/questions/16134975/how-can-i-reduc...

I think it's just inefficiently implemented. `git subtree` is just a shell script using other git commands. Take a look at the `--rejoin` flag.

asplake · on Feb 17, 2021

Thanks

> When you run git subtree push, it will recreate all commits for this subtree. It has to do that, as their SHA depends on the previous commit and needs those SHAs to be able to link the new commits to the old ones. It could cache that, but it doesn’t.

> I guess this is the price you pay for using subtree vs. submodules. Subtree is very stateless in your repository, which is nice on the hand but causes this long computation on the other hand. Submodules store all their information, which requires you to manage it but also makes stuff like this a lot faster.

rrosen326 · on Feb 17, 2021

My two cents- subtrees were SO much harder for my use case. Rename a directory? Ugh. Subtree split? Commit history? Just a mess. I switched back to sub modules and it’s such a relief.

vaughan · on Feb 17, 2021

Yeh, subtrees can get hairy. I think there is a need for a cli wrapper tool (with nice visualization) to hand-hold through the process. Too much can go wrong otherwise.

vaughan · on Feb 17, 2021

Yep, subtrees are the answer. I wrote [this piece](https://vjpr.medium.com/the-multi-monorepo-209041932fbf) yesterday about using a monorepo of monorepos using git subtree over git submodules, which should cover almost all use cases.

Basically, you only need other to create other git repos depending on access rights. That is, proprietary code vs open-source.

The biggest mistake is thinking: "I want to open-source these two packages, so I will create two repos and make them public and then add them as submodules in my private company repo". The problem though is that it's such a natural thought, and people don't realize the pain they are about to inflict.

Pxtl · on Feb 17, 2021

I'm using submodules in a project and I have to agree - they're awful. The fact that their ergonomics are so bad in as trivial cases as switching branches is flabbergasting.

I mean, it was the Right Tool for the Job for my use-case but the experience was awful.

I'm pretty well convinced that the emperor wears no clothes when it comes to Git in general.

pydry · on Feb 17, 2021

This has bugged me for a while.

Culturally there are three types of programmer tool:

* Those unknown or disliked enough that anything that goes wrong when using them is the tools fault even if it isn't.

* Those well known and liked enough that they get blamed for their own issues.

* Those that are put on a pedestal such that any problems they do have are the fault of the user.

git falls squarely into the latter camp.

mpawelski · on Feb 17, 2021

Is it still that bad if you set submodule.recurse[0] config?

[0] https://git-scm.com/docs/git-config#Documentation/git-config...

crowdhailer · on Feb 17, 2021

Submodules aren't perfect but we've made use of them in a few projects and one we had a process around them they were useful.

sto_hristo · on Feb 17, 2021

My lord, so much hate for them submodules. I'm actively using them for my project, where i have common components that are shared between other components. Having them as dependencies managed by a build tool is cumbersome as sometimes, purely for the sake of convenience, i need to update some shared component in place and have the changes propagate to other dependents.

Had a few mishaps, but those were the result of lack of experience. Overall, my experience with submodules is great. I can imagine what terror awaits those with bigger projects where everyone starts pushing random stuff from all directions.

robin21 · on Feb 17, 2021

Yeh for teams they are terrible. For solo development using a single branch they can see okay.

junon · on Feb 17, 2021

Most of the "complaints" here are people misusing submodules or not taking the time to learn how Git itself works.

I work with submodules almost daily for dependencies in game engines, OS projects, and even network services. I never have issues.

However, that's not to say they have a strange set of commands (or lack thereof). I have to touch the .git folder way too often, and that shouldn't be the case.

Maybe I'll write a small cookbook of how to deal with issues, including an "everything is fucked, how do I start over?" section.

Izkata · on Feb 17, 2021

Note that the "2009" is quite important, a few things here are solved (though with the usual odd interface if you're not used to it).

> If that repository also has submodules, then your submodule’s submodules will have to be populated by following the steps below from within your project’s submodule directory (confusing yet?).

Now:

  git clone repo target
  cd target
  git submodule update --init --recursive

> Unfortunately, this is wrong. Git does not have a built in way to remove submodules.

It's a two stage process (the first cleans up .git/config, the second actually removes the code and submodule), but does exist now:

  git deinit lib/billboard
  git rm lib/billboard

tigerlily · on Feb 17, 2021

I want to use Dear ImGui [1] in a C++ project. Sure I can copy the relevant files into my repo by hand, but my instinct says that is not the way. Git submodule seems like the right tool, but there's plenty of comments here advising not to use it. So if not git-submodule, then what is the procedure?

I just want to have the external repo as a subfolder in my own repo, updating that subfolder as updates become available.

[1] https://github.com/ocornut/imgui

nerdponx · on Feb 17, 2021

Git submodule is the right tool. Git submodule works well for externally-developed "read-only" parts of your project that you need to update infrequently.

Git submodules a few years ago used to be a pain in the ass, partly because the CLI was missing a lot of important functionality. Pretty much all of those concerns are in the past, and now it's just people complaining about having to pass a couple extra command line flags (which you can/should just add to your README).

tigerlily · on Feb 17, 2021

Thanks! A quick perusal of some reputable docs corroborates your recommendation [1].

[1] https://www.atlassian.com/git/tutorials/git-submodule

rubyist5eva · on Feb 17, 2021

Git is an insane tool. I've never seen a single command line tool blogged about so much. You don't see anything like this with regards to Mercurial or other distributed version control systems(and most of them are just as powerful as git, if not moreso), the documentation is usually enough for these tools.

For some reason git is special - it has this ability to be completely opaque to so many people and yet still be deployed almost literally everywhere. I am completely baffled by its ubiquity.

Thanks Github, I guess..?

nathell · on Feb 17, 2021

Oh yeah, submodules can bite. At my previous job, we tried open-sourcing a part of our monorepo using them, which resulted in a clunky workflow. So we whipped up a syncing bot integrating with GitHub that would propagate changes both ways: https://github.com/WorksHub/flow-bot

40four · on Feb 17, 2021

This post should have (2009) appended to it. It's not obvious because there is no date on the post, but if you scroll down to comments, they are from 2009.

That means these code examples were written in Git v1.6. The current Git release is v 2.30. I haven't inspected the commands, but I'd be willing to bet the interface has changed since then.

1MachineElf · on Feb 17, 2021

QMK is only one repository I ever spend time working with, and it has some submodules. I find it annoying that they must be pulled each time I switch branches. Should I look into changing them so that they at least point to a repository on my local machine instead of having to re-download from GitHub each time I want to create a new branch? Is that possible, and does anyone else do it?

zoobab · on Feb 17, 2021

Docker hub rails to build an image if you call hit submodules in there. I had the problem with DirtyJTAG Dockerfile, build fine on my laptop, does not build on Dockerhub.

I mean submodules were a pain in 2010, they are still a pain in 2021. So much for 'open source'.

dboreham · on Feb 17, 2021

Yes submodules are to be avoided, the 99% of the problem is poor documentation that fails to explain submodule behavior adequately, leading the reader to assume the feature conforms to a mental model that it doesn't.

ta8645 · on Feb 17, 2021

Do you have an example of this mismatch you could share?

pornel · on Feb 17, 2021

Users assume submodules would behave sensibly, and they don't.

State of submodules is part of a commit, so you'd expect that checking out a commit checks out the submodules. It doesn't.

Or if you notice git doesn't care about the content of submodules in the working tree, you'll be surprised when rebasing (change of a dir to/from submodule breaks rebase).

Merging has "ours"/"theirs" startegies, except it doesn't for submodules (there's suddenly local/remote distinction, and breaks -Xtheirs in rebase).

From the diffs it looks like submodules are defined in .gitmodules and the tree, so removing them from these places would delete them. It doesn't (it fucks up internal state of .git dir).

And you'd think .git dir is a dir, except when it isn't.

It's like whoever designed submodules just loves saying "gotcha!" and telling frustrated users they suck at git. Everything submodules do is subtly different with sharp edges.

juangacovas · on Feb 17, 2021

Anyone can compare the "externals" in SVN with "submodules" in git? I'm curious since I'm used to externals in SVN and they're ok but as always you have to keep some things in mind

Izkata · on Feb 17, 2021

A git submodule is almost the same thing as an svn external that's pinned to a specific revision. From that baseline, these are the differences I can think of:

* The submodules aren't automatically updated like externals are with "svn up", you need a second command.

* Adding a submodule is done with a command, not editing files/props directly.

* Any changes to the pinned revision are done inside the submodule (git pull, new commits, etc), which automatically changes the reference in the parent repo and then is committed in the parent repo.

* Removing a submodule is two commands now, not a bunch of manual work (article is out of date: https://news.ycombinator.com/item?id=26171333 ).

matheusmoreira · on Feb 17, 2021

I really hate how submodules are just pointers to specific commits. This forces an update to the superproject every time I update the submodule. Even if I add commits to the submodule, it won't check out the changes until the superproject's pointer is updated.

I just like to develop libraries and the application that uses them at the same time. Whenever I change the library, I want to see the changes in the application immediately.

The only tool I've used that got this right is Python's setuptools. It's got a development mode which essentially replaces the package with a symbolic link to a local repository. Why doesn't git support this?

newusertoday · on Feb 18, 2021

Android combines multiple git repositories with manifest(xml file) and has associated tooling for it. Not sure why that approach is not popular?

tpoacher · on Feb 18, 2021

Sometimes I almost feel like git is slowly trying to reinvent svn ...

krona · on Feb 17, 2021

Git submodule should be renamed 'git conway', named after Conway's Law: there is no good reason to use it except within those companies that have to compensate for dysfunctional technical leadership.

junon · on Feb 17, 2021

There are great reasons to use them. You've never been bitten by the alternative, it seems.