How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.
If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.
That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.
You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.
This policy effectively punts on the question of what tools were used to create the contribution, and states that regardless of how the code was made, only humans may be considered authors.
From the foundation's point of view, humans are just as capable of submitting infringing code as AI is. If your argument is sound, then how can Linux accept contributors at all?
EDIT: To answer my own question:
Instead of a signed legal contract, a DCO is an affirmation that a certain person confirms that it is (s)he who holds legal liability for the act of sending of the code, that makes it easier to shift liability to the sender of the code in the case of any legal litigation, which serves as a deterrent of sending any code that can cause legal issues.
This is how the Foundation protects itself, and the policy is that a contribution must have a human as the person who will accept the liability if the foundation comes under fire. The effectiveness of this policy (or not) doesn't depend on how the code was created.
Anyone distributing copyrighted material can be liable that DCO isn’t going to stop anyone.
If that worked any corporation that wanted to use code they legally couldn’t could just use a fork from someone who assumed responsibility and worst case they’d have to stop using it if someone found out.
Maybe. DCOs haven’t been tested. But you can at least say that the person who did this committed fraud and that you had no reasonable way to know they would do that.
LLMs can and do regurgitate code without the user’s knowledge. That’s the problem, the user has no way to mitigate against it. You’re telling contributors “use this thing that has a random chance of creating infringing code”. You should have foreseen that would result in infringing code making its way into the kernel.
OpenAI and Anthropic added an indemnity clause to their enterprise contracts specifically to cover this scenario because companies wouldn’t adopt otherwise.
Yeah, but that's not a useful thing to do because not everybody thinks about that or considers it a problem. If somebody's careless and contributes copyrighted code, that's a problem for linux too, not only the author.
For comparison, you wouldn't say, "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down", because then of course somebody would be careless enough to build a bridge that falls down.
Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.
It was already necessary to solve the problem of humans contributing infringing code. It was solved by having contributors assume liability with a DCO. The policy being discussed today asserts that, because AI may not be held legally liable for its contributions, AI may not sign a DCO. A human signature is required. This puts the situation back to what it was with human contributors. What you are proposing goes beyond maintaining the status quo.
It’s not solved. It hasn’t been tested in court to my knowledge and in my opinion is unlikely to hold up to serious challenge. You can be held liable for just distributing copyrighted code even if the whole “the Linux foundation doesn’t own anything” holds up.
> Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.
that's assuming that the problems and incentives are the same for everyone. Someone whose uncle happens to own a bridge repair company would absolutely be incentivized to say
> "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down"
Sorry, what's that got to do with anything? Who's the uncle supposed to be here in your analogy? The copyright owner? And so your hypothesis is that somebody's pushing AI in order to sneak copyrighted code into linux in order to sue them later? Seems very far fetched, and besides, why would I care about their incentives? Why would the linux foundation be interested in allowing that to happen?
Their position is probably that LLM technology itself does not require training on code with incompatible licenses, and they probably also tend to avoid engaging in the philosophical debate over whether LLM-generated output is a derivative copy or an original creation (like how humans produce similar code without copying after being exposed to code). I think that even if they view it as derivative, they're being pragmatic - they don't want to block LLM use across the board, since in principle you can train on properly licensed, GPL-compatible data.
If they merge it in despite it having the model version in the commit, then they're arguably taking a position on it too - that it's fine to use code from an AI that was trained like that.
Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.
Humans will not regurgitate longer segments of code verbatim. Even if we wanted to, we couldn’t do it because our memory doesn’t work that way. LLM on the other hand can totally do that, and there’s nothing you can do to prevent it.
Llm can but do they? Is there any evidence that they spit out a piece of code verbatim without being explicitly prompted to do so? NYT v OpenAI for example, NYT intentionally prompted to circumvent OpenAi's guardrail to show NYT articles
Anything generated by an AI is public domain.
You can include public domain in your GPL code.
I would urge some stronger requirement with the help of a lawyer. You only need a comment like "completely coded by AI, but 100% reviewed by me" to make that code's license worthless.
The only AI-generated part copyrightable are the ones modified by a human.
I am afraid that this "waters down" the actual licensed code.
...We should start opening issues on "100% vibecoded" projects for relicensing to public domain to raise some awareness to the issue.
> Anything new generated by an AI is public domain[1]
Language models do generate character for character existing code on which they are trained on . The training corpus usually contain code which is only source available but is not FOSS licensed .
Generated does not automatically mean novel or new the bar needed for IP.
[1] Even this is not definitely ruled in courts or codified in IP law and treaties yet .
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.