The paper is pretty vacuous IMO but there are at least a few reasons I think LLM testing is pretty nice:
* It’s actually easier to do TDD or black box testing with LLMs. Yes, the lazy approach is to feed it a function implementation and tell it to make a unit test. But you can instead feed it the function definition and a description of its behavior (which may be what you used to generate the implementation too!) and have it generate a unit test with no visibility to the spec.
* Unit tests tend to have a lot of boilerplate sometimes, often not copy-pastable (eg Go table test cases) and LLMs can knock that out super quickly.
* Sometimes you do actually want to add a ton of unit tests even if they’re a little too implementation-focused. It’s a nice step towards later having actually-good tests, and some projects are so poorly tested and plagued with basic breakages/bugs that it’s worth slowing down feature development to keep things stable.
Personally I hate when people try to automate this stuff though, because it does trend towards junk. I find it better to treat writing tests with LLMs tactically, basically the same way you use them to write code.
Unit tests and implementation for something like "parse this well-defined file format" are perfect for AI, low-scope, clear success criteria. Plenty of production code I write is more like "parse this well-defined file format".
* It’s actually easier to do TDD or black box testing with LLMs. Yes, the lazy approach is to feed it a function implementation and tell it to make a unit test. But you can instead feed it the function definition and a description of its behavior (which may be what you used to generate the implementation too!) and have it generate a unit test with no visibility to the spec.
* Unit tests tend to have a lot of boilerplate sometimes, often not copy-pastable (eg Go table test cases) and LLMs can knock that out super quickly.
* Sometimes you do actually want to add a ton of unit tests even if they’re a little too implementation-focused. It’s a nice step towards later having actually-good tests, and some projects are so poorly tested and plagued with basic breakages/bugs that it’s worth slowing down feature development to keep things stable.
Personally I hate when people try to automate this stuff though, because it does trend towards junk. I find it better to treat writing tests with LLMs tactically, basically the same way you use them to write code.