Agreed in general, the models are getting pretty good at dumping out new code, but for maintaining or augmenting existing code produces pretty bad results, except for short local autocomplete.
BUT it's noteworthy that how much context the models get makes a huge difference. Feeding in a lot of the existing code in the input improves the results significantly.
This might be an argument in favor of a microservices architecture with the code split across many repos rather than a monolithic application with all the code in a single repo. It's not that microservices are necessarily technically better but they could allow you to get more leverage out of LLMs due to context window limitations.
BUT it's noteworthy that how much context the models get makes a huge difference. Feeding in a lot of the existing code in the input improves the results significantly.