Abusing Conda's Turing-Complete YAML Comments

hmry · on Feb 26, 2024

"Turing-complete comments" is the scariest phrase I've read in a while

ipsum2 · on Feb 26, 2024

It's less scary and more informative if the title said "executes arbitrary python" instead.

shellac · on Feb 26, 2024

You find that _less_ scary? Turing-completeness doesn't entail file access etc. etc.

codr7 · on Feb 26, 2024

Removing YAML made it less scary.

oynqr · on Feb 26, 2024

Would adding XML make it worse?

tru3_power · on Feb 26, 2024

MySQL actually supports something similar (see executable comments using /*! MySQL-specific code */). I found this out during an assessment once and was shocked to see it lol.

https://dev.mysql.com/doc/refman/8.0/en/comments.html

lifthrasiir · on Feb 26, 2024

That's more like a conditional comment, which is a rather common strategy for backward compatibility. Conda's comment is entirely separate from the configuration language itself, which is not even Turing-complete.

JonChesterfield · on Feb 26, 2024

12 billion downloads sounds like a lot for a packaging tool that calls eval on bytes found in yaml comments.

pavon · on Feb 26, 2024

The more popular alternative to conda is pip/venv, which also executes arbitrary python code in setup.py.

akx · on Feb 26, 2024

Which is one reason why setup.py is on its way out. More often than not you'll just download a prebuilt wheel with pip anyway.

nurtbo · on Feb 26, 2024

What’s the difference between putting it in the yaml file vs putting it in the uploaded built conda package? Isn’t it the same risk profile?

Is there some way that new recipes are auto-parsed by every client?

Someone · on Feb 26, 2024

A yaml editor that does syntax coloring and wants to gray out lines that aren’t relevant to the current build might execute the code, but I hope they would at least sandbox that and, preferably opt out of supporting the full ’feature’, too.

doctorpangloss · on Feb 26, 2024

You’re right.

But the dysfunction in Python packaging is nothing new. Everything with adoption appeals to the lowest common denominator.

Another POV is that “code conductors” have always vastly outnumbered actual programmers, but now that Python is ChatGPT’s expertise, does packaging matter? People will use whatever packages the LLM’s really cool notebook interface ships with, and nothing more. The environment doesn’t even have access to the Internet, and yet: here we are. The next 20 million Python developers - almost outnumbering all professional programmer combined - are going to get away with literally never installing a package.

ghuntley · on Feb 25, 2024

Another entry for https://noyaml.com

woodruffw · on Feb 25, 2024

I don't think this really has anything to do with YAML; Conda appears to have defined a bespoke semi-DSL in comments rather than encoding it in YAML itself. They could have done this in any configuration language that supports comments. That doesn't make it any less cursed, of course.

(If I'm reading this correctly, they may have their own "YAML" parser entirely -- multiple keys with different values shouldn't be distinctly represent-able in a normal YAML document, at least not after parsing.)

anamexis · on Feb 25, 2024

> (If I'm reading this correctly, they may have their own "YAML" parser entirely -- multiple keys with different values shouldn't be distinctly represent-able in a normal YAML document, at least not after parsing.)

I don't think they have their own YAML parser - the preprocessor just strips out lines whose magic comments return False.

woodruffw · on Feb 25, 2024

I can't tell if that's better or worse :-)

c0balt · on Feb 26, 2024

Higher order YAML logic, sounds fun to implement as a preprocessor

IshKebab · on Feb 26, 2024

I think you're technically correct, but in practice you do see these misguided templating ideas seem to be almost universally YAML and Python.

I've never seen someone try to template JSON or XML like this. About the closest is HTML templating but even that is not really the best way these days.

aragilar · on Feb 26, 2024

I haven't seen anyone template JSON (probably because its so simple and painful to write you don't need to), but I have seen XML templating with comments (I guess you could XSLT, but I suspect the templating was easier to implement and understand). Also, helm/kubernetes are in go, so I think whenever strings are used as input, people will try to template it.

sapling-ginger · on Feb 26, 2024

> in any configuration language that supports comments

Notably, not JSON. In fact JSON doesn't support comment specifically to prevent this. And YAML was created specifically to add comments to JSON, in violation of this precaution. So YAML can eat the lunch that it serve itself.

woodruffw · on Feb 26, 2024

I have seen plenty of JSON applications abuse the "$" key namespace for comments. JSON Schema, for example, uses `$comment`[1]. So no, I don't think this is a shingle that can be uniquely hung around YAML's neck.

The Norway problem, anchors & references, infinite ways to begin a multi-line string, on the other hand, are all legitimate grievances that are somewhat unique to YAML.

Edit: And note: what makes this hack bizarre is that they chose to do it in comments, despite YAML having ample syntax for an inline dictionary or whatever else. YAML offers all kinds of exquisitely complicated ropes to hang yourself with, and they chose plain old boring comments!

[1]: https://json-schema.org/understanding-json-schema/reference/...

devsda · on Feb 26, 2024

> In fact JSON doesn't support comment specifically to prevent this.

They were mainly concerned about interoperability but partially yes.

Source: https://web.archive.org/web/20190112173904/https://plus.goog...

lifthrasiir · on Feb 26, 2024

You can exploit any ambiguity besides from comments. JSON in particular doesn't fully forbid duplicate keys in objects (though many implementations do reject them), and I think they were already abused as comments in the past. And you can always use special keys like `$when` (a la MongoDB) to add turing completeness, so even that point is moot.

hexane360 · on Feb 26, 2024

Meanwhile, there's still no support for dependency specifiers/environment markers (PEP 508/PEP 345) in environment.yml files, and the documentation is VERY sparse about what is supported: https://conda.io/projects/conda/en/latest/user-guide/tasks/m...

At least they support ~=3.9 and >=1,<3 comparisons (although this is undocumented as far as I can tell.

tetris11 · on Feb 25, 2024

conda recipes are becoming full of gunk with time, since there is now a bot that auto-updates your recipes.

You can view the PR and it waits for you to review and merge the changes to the YAML, but my god those diffs are becoming harder to grok.