Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Kubernetes Broke Git (matt-rickard.com)
76 points by kiyanwang on July 31, 2022 | hide | past | favorite | 57 comments


The problem with k8s is the lack of stable interfaces within the codebase. The resources themselves are well-versioned but even though the codebase is split up the individual pieces have dependencies on each other down to specific minor releases. The "separate" repos are so tightly coupled you wonder why they don't just smush them all together.

If you've ever tried to develop software that depends on k8s modules you know what I mean - you inevitably get a diamond dependency conflict that go mod can't easily handle because some package needs version 0.45 of apimachinery but something else needs 0.46 (made up versions but you get the point). If they wanted to have many small repos they should have some rigor around versioning and public interfaces between those repos, rather than this magic manifest of specific releases that work together.


Some of this is Google devs think in monorepos, where external interfaces and dependencies are somewhat easier problems.


> The Kubernetes build system is bash. The project experimented with bazel but removed it (too complicated, bad developer experience).

I know I’m being very uncharitable but I had to giggle at the irony here


Flat out laughed my ass off


Not that I think submodules are great, but they can be used while still maintaining atomicity. It's just that the atomic update happens when updating the submodule commit in the parent project.

This isn't any different than how say a b+tree (or other persistent data structures) rewrite their nodes from the leaf to the root, but leave non-involved subtrees as they were.

There winds up being a lot of activity on the superproject that amounts to just updating its submodules, but the commit log for it becomes a linearized history of stable/compatible commit versions.

There's definitely room for improvement wrt usability, but the claim that git has No atomicity across subprojects doesn't ring true to me


I would pay $20 if git would disallow committing from within the submodule directory (single source of truth), because it makes it so easy to forget a git push from within that submodule and now "works on my machine," but "special wtf edition!"


I've convinced that the real problem with submodules is basically the UI. In particular, the defaults are terrible. You can ask it to block pushing if there are unpushed submodule commits with:

    git config push.recurseSubmodules check
Or you can make it push automatically by replacing "check" with "on-demand".

You may also find it helpful to make various commands automatically apply to submodules with:

    git config submodule.recurse true


Thank you, that push.recurseSubmodules=check is the knob I was looking for, and wish it were the default

I got excited about the submodule.recurse=true one, but at least for "git status" it did not descend into the submodule the same way that "git submodule foreach git status" does


You could put a failing script (printing a message and exit 1) at .git/hooks/pre-commit in the submodule[1] (note that this is by-passed if you give --no-verify to git commit). Or you could put a script at .git/hooks/pre-push.sample in the parent repo (untested) that verifies if all commits in submodules are in the respective upstreams. I guess having that functionality in the core as a warning might not be a bad idea (I'm not a Git developer, you'll have to ask them).

[1] or rather ../.git/modules/"$submodulename"/hooks/pre-commit, depending on how the submodule was added


Last I checked, one cannot write any such silliness during clone, so it would require documentation or something out of band to fix it on every junior developer's workstation, and thus doesn't help as much as if there was a `git config --global commit.allowSubsubmodle false` or whatever

> Or you could put a script at .git/hooks/pre-push.sample in the parent repo (untested) that verifies if all commits in submodules are in the respective upstreams

(a) those scripts are most certainly not called .sample (b) it was fast enough to set up a local test case and (as expected) each repo (outer and "inner" submodule) carry their own git hook setups and `echo "exit 1" > .git/hooks/pre-commit` does stop top level commits but does nothing for the inner repos


> .sample

Yes, I carelessly copy-pasted the path.

And yes, as I said it would probably be a good idea to have a feature like that in the core. -- Ah, I see the other reply, so it's done. I generally recommend asking such questions on #git on IRC, someone will know the answer if a feature already exists.


Are you saying submodule SHA was updated in the parent repo, but the new submodule commit which the parent repo is now pointing to was never pushed?

I haven't experienced that scenario before but it seems like there'd be an obvious git error?


No, the opposite:

    git clone --recursive some/repo.git
    cd repo/subrepo
    sed -i"" s/hello/goodbye/ README.md
    git commit -am 'lololo'
    cd ..
    git commit -am'subrepo with *local* sha reference'
    git status  # everything is clean!
now that I know that can happen, running `git submodule foreach git status` will surface the "your branch is ahead of" magic text that indicates what has gone on, but it would be tons better if the system understood what was happening and didn't allow such a bad outcome


I feel the title is incorrect. This isn’t so much a critique of Git, it’s a critique of the way Kubernetes-the-project uses it.


> This isn’t so much a critique of Git, it’s a critique of the way Kubernetes-the-project uses it.

I'd hope so: the Linux codebase is an order of magnitude bigger than the K8s one and it's not breaking Git.


He mentioned in the article that Kubernetes didn’t have a Linus Tovalds to oversee the project. I found a similar thing to be true with git, as often scaling isn’t so much a technical challenge at first, rather it’s a bunch of org challenges that manifest themselves in technical ways.

The model of the “benevolent dictator” kinda works in this case. My last project I was on we managed to scale to about 50k LOC without anything special, the key is I knew the repo like the back of my hand and could catch potential integration issues. While the model works well, it’s very hard to setup as you need a real nerd of a team lead to constantly watch the repo.


I don’t see why a project should strive for a lack of leadership

Most things in life that don’t have leadership become messy and disorganized and eventually disintegrate into an unstable hell with no real focus


"How Kubernetes' usage of git (and GitHub) did not scale well post-monorepo"


"How my nail broke hammers"


The title is correct as it blames k8s for mishandling git and "breaking" it. It's not git's fault.


I'm still not quite sure how that counts as breaking git?

If you misuse a tool, and the tool performs poorly at the job it's not designed for, but never fails in an unexpected way, and still maintains all the functionality it always had, how have you broken that tool?

If try to hammer in a nail with the butt of a screwdriver, and make a complete pigs ear of it, but the screwdriver absorbs the abuse and is still perfectly usable as a screwdriver afterwards, did I "break" the screwdriver?

Or, am I misunderstanding how the word "breaking" is being used here? Is there a meaning I'm not getting?


Ok, tell me the tool they could use instead of git to handle and overcome the type of organization and technical they are experiencing


Huh? What has that got to do with whether or not they broke git?


That was an interesting read thanks for sharing it! My takeaway from this is that the k8s codebase is complicated with complicated workflows. Simple tools won't suffice for this development system. They had to create some of their own tooling; in a way that's quite in line with the usage of k8s, a lot of existing OS concepts had to be recreated just for k8s.

> authorization

> package management

> So why shouldn't a VCS embrace its role as a collaboration tool and explore more generic merge-based optimizations like a queue?

In the bottom section there are a few 'wishlist' items, can I call it that. But those aren't good VCS features, they're a reflection of the k8s development world which is not how most of us do development.

It's also assuming that because k8s is an all-in-one-doing-many-things, that the VCS it uses should also be a huge all-in-one. I don't think it should; all that would happen is the leaking of k8s' already complex existence from k8s into git.

Then it really would break git by making git worse for everyone. I would suggest finding another tool, or making your own.


Good overview, I know these sorts of pains well. Lots of hard questions and few definitive wins/right answers. How to organize a massive repository out in the open is still an open question. On that note, recently, I've been experimenting with this project called josh, which basically is like 'git subtree on extreme steroids, functioning as a git proxy':

https://josh-project.github.io/josh/

It basically lets you unify/view many repositories as a single one, or equivalent to split a mono-repo into smaller sized units of work for CI, specific teams, etc. It's bidirectional, so you push and pull from josh and everything goes into a single linear history in the mono repo. And because it's bidirectional, people in the mono-repo can still do things like make large-scale atomic changes across all sub-repositories, and those get reflected.

Josh currently isn't suitable for a lot of workloads due to various reasons (authentication is one that stands out), but it's actually the first tool I have seen that manages to offer BitKeeper-like "subtrees" that work really well, at scale, for large repos and teams. It requires some care to make sure "sub-trees" can be usable units of work, but it was one of the best features of BK in my opinion and really great for people doing one-off contributions, or isolating trees/changes to specific developers.

I'd be interested to know if there are other open alternatives to this. It's a nice point in the design space between solutions like "integrate with the filesystem layer to do sparse clones" or "just split up the repos."


Curious why you say this isn't suitable for authN? This seems maximally suited for processes where changes (or more likely, additions) in a split out system (authn, authz) may immediately rely on behavior or interface changes elsewhere.


I just mean that, I don't see how to properly integrate the josh-server with my SSH keys to do access, how to apply basic roles of who can push/read to what workspace, etc. If I wanted to do this at work "How does it integrate with authentication" is like a #1 concern from an infosec team who'd sign off. If the answer is "It doesn't, use something in front of it" that might be OK too.

In fact I think authenticating and authorizing access to components of a monorepo is definitely in scope for its design, and could allow really powerful and cool things. But I just don't think it does that yet. Maybe it shouldn't. That's all I mean. It's not like a hard rule. There were some other things that I thought might hold me up, but I can't remember them now...

Not sure if you've ever used it but there's some self-hosted software out there called 'gitolite' that does a lot of this. Perhaps they can be made to work together seamlessly...


The whole topic of access control is way more work than you might expect:

There is already an idea and initial implementation of path based ACLs for Josh. However, even if that concept was perfect and already implemented, it would be kind of useless.

Why?

To be useful in practice we would need a UI for code review in a monorepo. This UI would need to aware of the "workspaces" and ACLs in the repo and respect them when showing files and diffs to the users.

As it stands now, Josh is used together with either Gerrit or GitHub (or similar). Patches or PRs are always being reviewed in the context of the full backing monorepo. As long as those are the only options to do code review, I don't see the value of having access control at the Git level.

That being said, I am planing to create a new code review tool that does support these things, but it has a long way to go before it will be a serious alternative to the common tools used today.


ACLs are in the works https://github.com/josh-project/josh/pull/561

That being said, the overall scope of permissions and authentication is more complicated, more improvements is definitely needed


I interpreted that to be an example reason, not an example workload.



Are you saying you view the project as a curiosity or you have problems that it is actually solving that a git monorepo doesn't?


Not sure if you will see this but, yes, I do see it as solving some problems I have. They aren't necessarily problems a monorepo "doesn't solve", and could have other solutions, I'm just saying Josh is a powerful complement to the monorepo design and helps smooth out some issues.

Choosing a monorepo vs many-repos is a tradeoff, there are consequences to choosing a monorepo, many solvable, and a tool like this just provides robust solutions to a couple of them, is what I'm saying.

Actually one very common use case Josh solves that I've had in the past is "Merge a repository into another, while developers keep using the original as if nothing happens." This is important to keep teams moving while a migration happens. I have this problem right now at work; two repos that want to be one repo, one smaller and one bigger. You can use git filter-history to do this but Josh is significantly more powerful, and most importantly the team whose repository got "merged away" (i.e. got merged into the bigger repository) can keep working on their repository as if nothing happened, and you can eventually switch them over to the main repo. Normally you have to stop the whole train at once and move people over while some poor bastard has to surgically modify the git repository after they take the git repository down. But Josh allows to you to merge and incrementally migrate that repository, because you can now view one repository as a "workspace" of another. It's sort of like the difference between taking an optimistic lock vs a normal wait lock, in my mind. Josh lets you do "optimistic locking" when merging two repositories, rather than making every team stop -- serializing -- while it happens.

Another common case is "I need to mirror a subset of my proprietary repository onto GitHub." Actually "I need to mirror subset X" in general. This is another monorepo problem that, while not super complex, is actually really nice to solve in this way because bidirectionality means people who patch the mirror downstream can still have their changes merged upstream. This isn't always possible for QA/workflow purposes (e.g. their downstream change could break something upstream, so many people choose to instead apply it the other way around), but it's something I've experimented with. Not unthinkable.

I do think it's a very promising project with many real world applications. I'm still figuring out how best to organize and use it, just like we do with git.


I didn't really understand the first part. Isn't a monorepo how you get atomicity across sub-projects? If that was a problem for kubernetes, wouldn't a single commit that affected multiple repositories have the same issue?

The merge issues seem like they would be solved by your code-hosting platform. (GitLab has Code Owners and Merge Trains and I imagine GitHub has something similar) To me, these features are something you'd implement in your centralized tool rather than git which has to support a decentralized workflow. Perhaps someone clever could think up a decentralized authorization system for git, but is it worth it when almost every project has a centralized source-of-truth repo?


I’ve read the article 3 times now. I still fail to find the part where git broke. I also fail to find a concise explanation on what the solution is. I may need a beer to understand this.


As best I can tell, the heart of the alleged issue is this from the middle:

> A system that could record atomic commits across projects or a better submodule experience would have allowed for more flexible developer organization, especially as the project grew to a new scale.

but otherwise I'm with you that this could have used a better title or something


I can’t tell if it’s because they had a good process and the tools didn’t fit or because they have a terrible process and nothing was going to work for them anyway

I see the latter much more often when I jump companies and Google’s projects have had terrible API stability so I’m not really sure Git is to blame here


GitHub is the system that broke. All the communication systems within it needed to be disabled for custom notifications since it would otherwise be too noisy.


It’s just clickbait, the author posts clickbait on here regularly. It works, unfortunately.


I guess it's not popular to say but most of k8s's problems here stem from inflexibility of Go. Otherwise it'd probably just be a monorepo still.


Do other devs just open up multiple vs code windows for each project or something?

I can't stand not being able to run everything in the same window with ctrl P picking up files from across projects as a reference.

I feel like I'm the odd one out because I've noticed a lot of languages and Lang servers are making these assumptions about how devs work and organise code.

Or they're just being perfectionist opinionated twats.


> The solution might seem simple, but even simple problems become difficult at scale, especially when many different people and organizations are involved.

In modeling & simulation, this is called "emergent behavior". While that may be imprecise in terms of the definition, stand by for the effects.

Doing anything at scale separates the pros from the dilettantes, e.g., me.


I'd be interested to know why projects like this don't make use of git submodules. They take some getting used to but once understood seem to do a reasonable job of multi-repository projects to me.


Because submodules absolutely suck, and if you have the option of merging the two repos, they're 1000x more terrible. This terribleness increases nearly linearly with the number of active project participants. Why? Because they completely violate Git's operating model, which is to track content, not pointers to content. This is the fundamental problem, regardless of project dynamics, Kubernetes, whatever. In fact, that they make "git commit", one of the most fundamental operations you can perform, significantly riskier, because you can easily introduce submodule pointer changes from your dirty working copy. And people do this all the time. One of the most common cases is committing a submodule "update" while accidentally in a dirty working tree, so when you push the commit pointer simply doesn't exist anywhere. They also fail immediately once you try to `git merge` (or less commonly, git cherry-pick), because their merge conflicts, by definition, cannot be resolved automatically. Because they are pointers, not content. But git works on content.

So just to be clear, two of the most fundamental day-to-day operations you can perform are turned into massive liabilities from this feature, ones that are likely to either break your build and/or just make your life harder. As someone who had to maintain stable and development branches of a project, cherry pick between the two, cut releases, etc, submodules are simply hell, because they make an already difficult job worse. This is a good sign that they are a liability. In a past life we actually had so many people push invalid submodule updates over time that we eventually wrote a git hook on our server to reject all commits with submodule updates that didn't exist in the corresponding repository, and that were not specifically tagged in the commit message as updating a submodule (through a magic set of keywords.) The fact we even had to do this is its own pain.

I have maintained projects that have long-standing histories with dozens of submodules. And every single time we eliminated one of those submodules (often by merging into the parent repository, or simply dropping the dependency entirely), we all breathed a sigh of relief, and our lives all got significantly better from that point forward.

As you can tell, this experience has made me very prepared to fight against submodules everywhere I might see or encounter them. But trust me: it's for your sake, not mine; 'cause there ain't a chance in hell anyone is adding any to my repositories.


It sounds like you are arguing in favor of vendored dependencies over submodules?


If I had to choose between the two? Probably, yeah. But it depends on what's being vendored, I guess. Like, I might not want to vendor the entire transitive dependency tree of some app if it's not security critical. But I would probably vendor the tree of an OSS repository I patched extensively to meet my needs, yes. Depending on the environment some other solutions might be possible. It's a complex topic.

I will admit that if your dependency in this case is something that changes extremely rarely, and will only see updates maybe like, bi-annually, submodules are "OK." Not great, but they'll work, and presumably won't inflict massive psychic damage on your team members. But it's once they receive any foot traffic by more-than-one-person that all the real pain begins.


Submodules are an utter clusterfuck. I consider myself a pretty experienced user and every time I try to use submodules, they bite me in the ass eventually.

The way they are operated just doesn't fit the way any human thinks.


Gosh, I’m so pleased for all the replies here. I felt the same way. Submodules made me feel stupid, like my mental model of Git was just entirely broken.

What I wanted to do was be able to have people work against a pinned version of a different Git repo, then update the sub module whenever we felt the need and handle the build breaks. This task seemed impossible to do correctly over time, which I just could not understand. How was the submodule getting updated when I didn’t call anything? Why are submodule changes appearing in other commits? I just couldn’t figure it out.

I am joining a new project and they started talking about submodules and I what I said was “yeah uhhuh cool” but inside I was pretty nervous . But I couldn’t be sure it wasn’t because I was a dummy and they knew exactly what they were doing, so I kept quiet.


The first section of the article discusses submodules and some of the pain points they suffered while moving to them.


That's strange this just trended on HN. My team is about to decide on the design of a new feature for our internal product, which can essentially be done by either using sub-modules or re-implementing them.

Our product has a somewhat simplistic git interface (behind the scenes it's anything but) and I've tried to keep it so, however lately customers have started demanding we also support submodules.

The problem is that we use git to mirror a hierarchical database, so using submodules means mirroring another hierarchy inside our hierarchy. This would mean changing the current assumptions in the code to ignore things in the sub-hierarchy except for the sub-sub-hierarchy we care about. Yeah this is hand-wavy but the design constraints I've had are kinda hard to explain.

Also, sub-modules would require changing all our git calls to take submodules into account, including cloning, reset, branches etc.

And I've read many people's bad experience with sub-modules, including the ones in this sub-thread and so now I'm afraid they might hurt our maintainability in the long run.

I've thought of a couple of ways to do this without sub-modules, including using git subtree, but all of them have drawbacks.

I've actually found a neat way to merge a sub-directory of another remote repo to the current repo - which means we wouldn't have to change any of the existing code. It involves only "standard" git commands, basically only checkout, reset, and merge (without "exotic" commands like subtree, read-tree, and whatnot). And using "exclude" to keep only the content of a single subdirectory of the remote. But it does require us to maintain a file that's exactly like .gitsubmodules to keep track of remotes. And that's the thing which git submodules does for us "for free".

Also I've developed a bad state for bespoke solutions and NIH. I already fear I have contributed more than enough NIH to my company by developing the existing solution, but given the conditions I think it was the only logical solution (a previous bespoke solution failed and was cancelled).

But the longer I read and experiment with submodules it looks like they are also a kind of bespoke solution around basic git, and essentially require changing the way you handle operations such as reset, checkout, etc. Training all our users to fix errors due to out-of-sync submodules looks like a nightmare, when they already have problems with the current solution and with git in general. So I'm really conflicted on what our current path should be.


I seem to be one of the few in the comments that actually likes submodules.

But the description of this use case / requirement sounds complicated / confusing to me ("we use git to mirror a hierarchical database, so using submodules means mirroring another hierarchy inside our hierarchy").

I don't know what your product is but I hope for your sanity a PoC is possible.


What will happen when Linus is no longer with us?


A damn good question. These old school mega merges are widely avoided in most places these days, and with good reason. So the skills for doing this are thin on the ground. Plus it’s one of the largest projects in the world and whilst Linus got to grow into the current role over decades, a replacement would be starting on an already speeding train.

An obvious solution is modularization and stable internal ABIs, but the Linux community have avoided that approach for a long time, and with good reason.


Whatever happens I hope (not for me/us really, but for our kids and next generations) there is a plan and will materialize as single upstream.


Your question is traditionally phrased as "what if Linus gets hit by a bus?", and if you search the web for variants of that phrase, you'll see it being discussed as early as last century (it didn't take me long, for instance, to find a slashdot comment from 1999 mentioning that hypothetical scenario). The answer back then was "Alan Cox takes over"; the specific maintainer who takes over has changed over the years (nowadays it's probably Greg Kroah-Hartman), but other than that, the answer has remained the same.


GKH takes over?

IIRC he's already the maintainer for the "stable" branch so a lot of the work people think Linus does is already on his plate.


I swear to god.. if Kubernetes-creep reaches how git works I'll scream.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: