Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Abusing Conda's Turing-Complete YAML Comments (astrid.tech)
79 points by bo0tzz on Feb 25, 2024 | hide | past | favorite | 26 comments


"Turing-complete comments" is the scariest phrase I've read in a while


It's less scary and more informative if the title said "executes arbitrary python" instead.


You find that _less_ scary? Turing-completeness doesn't entail file access etc. etc.


Removing YAML made it less scary.


Would adding XML make it worse?


MySQL actually supports something similar (see executable comments using /*! MySQL-specific code */). I found this out during an assessment once and was shocked to see it lol.

https://dev.mysql.com/doc/refman/8.0/en/comments.html


That's more like a conditional comment, which is a rather common strategy for backward compatibility. Conda's comment is entirely separate from the configuration language itself, which is not even Turing-complete.


12 billion downloads sounds like a lot for a packaging tool that calls eval on bytes found in yaml comments.


The more popular alternative to conda is pip/venv, which also executes arbitrary python code in setup.py.


Which is one reason why setup.py is on its way out. More often than not you'll just download a prebuilt wheel with pip anyway.


What’s the difference between putting it in the yaml file vs putting it in the uploaded built conda package? Isn’t it the same risk profile?

Is there some way that new recipes are auto-parsed by every client?


A yaml editor that does syntax coloring and wants to gray out lines that aren’t relevant to the current build might execute the code, but I hope they would at least sandbox that and, preferably opt out of supporting the full ’feature’, too.


You’re right.

But the dysfunction in Python packaging is nothing new. Everything with adoption appeals to the lowest common denominator.

Another POV is that “code conductors” have always vastly outnumbered actual programmers, but now that Python is ChatGPT’s expertise, does packaging matter? People will use whatever packages the LLM’s really cool notebook interface ships with, and nothing more. The environment doesn’t even have access to the Internet, and yet: here we are. The next 20 million Python developers - almost outnumbering all professional programmer combined - are going to get away with literally never installing a package.


Another entry for https://noyaml.com


I don't think this really has anything to do with YAML; Conda appears to have defined a bespoke semi-DSL in comments rather than encoding it in YAML itself. They could have done this in any configuration language that supports comments. That doesn't make it any less cursed, of course.

(If I'm reading this correctly, they may have their own "YAML" parser entirely -- multiple keys with different values shouldn't be distinctly represent-able in a normal YAML document, at least not after parsing.)


> (If I'm reading this correctly, they may have their own "YAML" parser entirely -- multiple keys with different values shouldn't be distinctly represent-able in a normal YAML document, at least not after parsing.)

I don't think they have their own YAML parser - the preprocessor just strips out lines whose magic comments return False.


I can't tell if that's better or worse :-)


Higher order YAML logic, sounds fun to implement as a preprocessor


I think you're technically correct, but in practice you do see these misguided templating ideas seem to be almost universally YAML and Python.

I've never seen someone try to template JSON or XML like this. About the closest is HTML templating but even that is not really the best way these days.


I haven't seen anyone template JSON (probably because its so simple and painful to write you don't need to), but I have seen XML templating with comments (I guess you could XSLT, but I suspect the templating was easier to implement and understand). Also, helm/kubernetes are in go, so I think whenever strings are used as input, people will try to template it.


> in any configuration language that supports comments

Notably, not JSON. In fact JSON doesn't support comment specifically to prevent this. And YAML was created specifically to add comments to JSON, in violation of this precaution. So YAML can eat the lunch that it serve itself.


I have seen plenty of JSON applications abuse the "$" key namespace for comments. JSON Schema, for example, uses `$comment`[1]. So no, I don't think this is a shingle that can be uniquely hung around YAML's neck.

The Norway problem, anchors & references, infinite ways to begin a multi-line string, on the other hand, are all legitimate grievances that are somewhat unique to YAML.

Edit: And note: what makes this hack bizarre is that they chose to do it in comments, despite YAML having ample syntax for an inline dictionary or whatever else. YAML offers all kinds of exquisitely complicated ropes to hang yourself with, and they chose plain old boring comments!

[1]: https://json-schema.org/understanding-json-schema/reference/...


> In fact JSON doesn't support comment specifically to prevent this.

They were mainly concerned about interoperability but partially yes.

Source: https://web.archive.org/web/20190112173904/https://plus.goog...


You can exploit any ambiguity besides from comments. JSON in particular doesn't fully forbid duplicate keys in objects (though many implementations do reject them), and I think they were already abused as comments in the past. And you can always use special keys like `$when` (a la MongoDB) to add turing completeness, so even that point is moot.


Meanwhile, there's still no support for dependency specifiers/environment markers (PEP 508/PEP 345) in environment.yml files, and the documentation is VERY sparse about what is supported: https://conda.io/projects/conda/en/latest/user-guide/tasks/m...

At least they support ~=3.9 and >=1,<3 comparisons (although this is undocumented as far as I can tell.


conda recipes are becoming full of gunk with time, since there is now a bot that auto-updates your recipes.

You can view the PR and it waits for you to review and merge the changes to the YAML, but my god those diffs are becoming harder to grok.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: