It was just published. Too new for someone to conduct a direct study to critique and journals don't just publish critiques anyway. It would have to be a study that disputes the results.
They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.
Veteran maintainers on projects they know inside-out. This is a bias.
Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.
Time was not independently logged and was self-reported.
No possible direct quality metric is possible. Could the AI code be better?
The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.
>They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure
This is reasonable, but there have been enough anecdotal evidence from developers over the last 3 years for me to believe the data is measuring something real.
>Veteran maintainers on projects they know inside-out. This is a bias
I think this is complete BS. The study was trying to measure the real world impact of these tools with experienced developers. I think having them try them out on greenfield work, or a code-base they are not familiar with, makes it harder to measure this.
Also, let's be honest--if the study showed that LLMs DID increase productivity on greenfield work, does that even matter? How many developers out there are starting greenfield projects on a weekly basis? I'd argue very few. So if the study is suggesting that experienced developers are better working on code they're already familiar with without the assistance of an LLM, then that means the vast majority of software development work could be better off without LLMs.
>Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value
Again, for MANY developers, they are going to have deep repo knowledge. If they're not faster with LLMs, despite the knowledge, why use them? You're trying to prop up this as bias against the study, but IMO you're missing the point.
They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.
Veteran maintainers on projects they know inside-out. This is a bias.
Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.
Time was not independently logged and was self-reported.
No possible direct quality metric is possible. Could the AI code be better?
The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.
Many of the devs were new to Cursor
Bias in forecasting.