Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Claude Code is hard to describe. It’s almost like I changed jobs when I started using it. I’ve been all-in with Claude as a workflow tool, but this is literally steroids.

If you haven’t tried it, I can’t recommend it enough. It’s the first time it really does feel like working with a junior engineer to me.





Weirdly enough I have the opposite experience where it will take several minutes to do something, then I go in and debug for a while because the app has become fubar, then finally realize it did the whole thing incorrectly and throw it all away.

And I reach for Claude quite a bit because if it worked as well for me like everyone here says, that would be amazing.

But at best it’ll get a bunch of boilerplate done after some manual debugging, at worst I spend an hour and some amount of tokens on a total dead end


Some great advice I've found that seems to work very well: ask it to keep a succinct journal of all the issues and roadblocks found during the project development, and what was done to resolve or circumvent them. As for avoiding bloating the code base with scatterbrained changes, having a tidy architecture with good separation of concerns helps leading it into working solutions, but you need to actively guide it. For someone that enjoys problem-solving more than actually implementing them, it's very fun.

to continue on this, I wouldn't let claude or any agent actually create a project structure, i'd guide it in the custom system prompt. and then in each of the folders continue to have specific prompts for what you expect the assets to be coded like, and common behavior, libraries, etc....

So you've invented writing out a full business logic spec again.

btw, I'm not throwing shade. I personally think upfront design through a large lumbering document is actually a good way to develop stuff. As you either do it upfront, or through endless iterations in sprints for years.


Yeah, my experience of working with Claude Code is that I’m actually far more conscientious about design. After using it for awhile, you get a good sense of its limits and how you need to break things down and spell thing out to overcome.

The problem with waterfall wasn't the full business spec, it was that people wrote the spec once and didn't revise it when reality pushed back.

I spent 10 minutes writing out the business logic, you don't have to do it all at once. We're not talking about long complicated things here.

> For someone that enjoys problem-solving more than actually implementing them, it's very fun.

So, is Claude just something you use for fun? Would you use it for work?


Sigh. As others have commented, over and over again in the last 6 months we've seen discussions on HN with the same basic variation of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase."

I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)

* what exactly you were trying to do

* how much experience you have with the AI tool

* is your tool set up so it can get a feedback loop from changes, e.g. by running tests

* how much prompting did you give it; do you have CLAUDE.me files in your codebase

and so on.

As others pointed out, TFA also has the problem of not being specific about most of this.

We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!


It's telling that you ask these details from a comment describing a negative experience, yet the top-most comment full of praises and hyperbole is accepted at face value. Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

I actually think it's more productive to just accept how people describe their experience, without demanding some extensive list of evidence to back it up. We don't do this for any other opinion, so why does it matter in this case?

> Let's try and move the discussion beyond that!

Sharing experiences using anecdotal evidence covers most of the discussion on forums. Maybe don't try to police it, and either engage with it, or move on.


>Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

Sort of.

The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

The people that seem to be unhappy with it tend to have the universal complaints of either "it produces garbage" , or "I'm slower with it.".

Maybe i'm showing my age here, but I remember these same exact discussions between people that either praised or disparaged search engines. The alternative being an internet Yellowpages (which was a thing for many years.)

The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

in other words : proof is kind of in the pudding.

I wouldn't care about the compiler logs from a user that ignored all syntax and grammar rules of a language after picking it up last week, either -- but it's useful for successful devs to share their experiences both good and bad.

I care more about the opinions of those that know the rules of the game -- let the actual teams behind these software deal with the user testing and feedback from people that don't want to learn conventions.


> The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

Ah, but "whatever those might be" is the crucial bit.

I don't entirely disagree with what you're saying. There will always be a segment of power users who are able to leverage their knowledge about these tools to extract more value out of them than people who don't use them to their full potential. That is true for any tool, not just in software.

What you're ignoring are two other possibilities:

1. The expectation of users can be wildly different. Someone who has never programmed before, but can now create and ship a smartphone app, will see these tools as magical. Whatever issues they have will either go unnoticed, or won't matter considering the big picture. Surely their impression of AI tooling will be nothing short of positive. They might be experts at using LLMs, but not at programming.

OTOH, someone who has been programming for decades, and strives for a certain level of quality in their work, will find the experience much different. They will be able to see the flaws and limitations of these tools, and addressing them will take time and effort that they could've better spent elsewhere. As we've known since the introduction of LLMs, domain experts are the only ones who can experience these problems.

So the experience of both sides is valid, and should have equal weight in conversations. Unlike you, I do trust the opinion of domain experts over those of user experts, but that's a personal bias.

2. There are actual flaws and limitations in AI tooling. The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong. It steers the conversation away from issues with the tech that should be discussed and addressed. And considering the industry is strongly propelled by hype and marketing right now, we need conversations grounded in reality to push back against it.


> The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong.

I’m not sure about that. I feel like someone experienced would realize when using the LLM is a better idea than doing it themselves, and when they just need to do it by hand.

You might work in a situation where you have to do everything by hand, but then your response would be to the extent that you can see how it’s useful to other people.


> The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

One big warning here: search engines only became really useful when you could search for "who won the game" and the search engine actually returned the correct thing as the top result.

We're more than a quarter of a century later and probably 99.99% of users don't know about Google's advanced search operators.

This should be a major warning for LLMs. People are people and will do people things.


I should have been clearer - I'd like to see this kind of information from positive comments as well. It's just as important. If someone is having success with Claude Code while vide-coding a toy app, I don't care. If they're having success with it on a large legacy codebase, I want them to write a blog post all about what they're doing, because that's extremely useful information.

I jumped the gun a bit in my comment, since you did mention you want to see this from both sides. So it was clear, and I apologize.

The thing is that I often read this kind of response only to comments with negative experiences, while positive ones are accepted as fact. You can see this reinforced in the comments here as well. A comment section is not the right place to expand on these details, but I agree that blog posts should have them, regardless of the experience type.


It’s telling that they didn’t specifically address it at the negative experience and you filled that in yourself

It was the comment they replied to. If it was a general critique of the state of discourse around agentic tools and Claude Code in particular why not make it a top level comment?

Oh, because I wanted to illustrate that the discourse is exemplified by the pair of the GP comment (vague and positive) and the parent comment (vague and negative). Therefore I replied to the negative parent comment.

>But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

They did mention "(both positive and negative)", and I didn't take their comment to be one-sided towards the AI-negative comments only.


They're tools. To a fluent tool user, the negative anecdotes sound like,

"I prefer typewriters over word processors because it's easier to correct mistakes."

"I don't own any forks because knives are just better at cutting bread."

"Bidets make my pants wet, so I'll keep to toilet paper."

I think there's an urge to fix misinformation. Whereas if someone loves Excel and thinks Excel is better than Java at making apps, I have no urge to correct that. Maybe they know something about Excel that I don't.


The framing has been rather problematic. I find these differences in premises are lurking below the conversations:

- Some believe LLMs will be a winner-take-all market and reinforce divergences in economic and political power.

- Some believe LLMs have no path of evolution and have therefore already plateaued and too low to be sustainable with these investments in compute, which would imply it's a flash in the pan that will collapse.

- Some believe LLMs will all be hosted forever, always living in remote services because the hardware requirements will always be massive.

- Some believe LLMs will create new, worse kinds of harm without enough offsetting creation of new kinds of defense.

- Some believe LLMs and AI will only ever give low-skilled people mid-skill results and therefore work against high-skill people by diluting mid-end value without creating new high-end value for them.

We need to be more aware of how we are framing this conversation because not everyone agrees on these big premises. It very strongly affects the views that depend on them. When we don't talk about these points and just judge and reply based on whether the conclusion reinforces our premises, the conversation becomes more political.

Confirmation bias is a thing. Individual interests are a thing. Some of the outcomes, like regulation and job disruption, depend on what we generally believe. People know this and so begin replying and voting according to their interests, to convince others to aid their cause without respect for the truth. This can be counter-productive to the individual if they are wrong about the premises and end up pushing an agenda that doesn't even actually benefit them.

We can't tell people not to advance their chosen horse at every turn of a conversation, but those of us who actually care about the truth of the conversation can take some time to consider the foundations of the argument and remind ourselves to explore that and bring it to the surface.


Fair point.

For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.

Prior to Claude Code, I have been using Cursor for an year or so.

Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)

I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.

I made new claude.md files.


I spend a fair amount of time tinkering in Home Assistant. My experience with that platform and LLM's can be summed up as "this is amazing".

I also do a fair amount of data shuffling with Golang. My LLM experience there is "mixed".

Then I deal with quite a few "fringe" code bases and problem spaces. There LLM's fall flat past the stuff that is boiler plate.

"I work in construction and use a hammer" could mean framer, roofer or smashing out concrete with a sledge. I suspect that "I am a developer, I write code" plays out in much the same way, and those details dictate experience.

Just based on the volume of ruby and typescript, and the overlap of the output of these platforms your experience is going to be pretty good. I would be curious if you went and did something less mainstream, and in a less common language (say Zig) if you would have the same feelings and feedback that you do now. Based on my own experience I suspect you would not.


Speaking of that observation about "fringe": this will probably, increasingly, be a factor, let's call it LLMO (optimization), where "LLM friendly" content will be pushed. So I expect secondary or fringe programming languages to become even more pushed aside, since LLMs will not be as useful.

Which is, obviously, sad. Especially since the big winner is Javascript, a language that's still subpar as far as programming languages go.


Here's a few general observations.

Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.

The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.

I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).


Right, which is why having a comprehensive test suite is such an enormous unlock for this class of technology.

If your tests are good, Claude Code can run them and use them to check it hasn't broken any distant existing behavior.


Not always the case. It’ll just go and “fix” the tests to pass instead of fixing the core issue.

That used to happen a whole lot more. Recent Claudes (3.7, 4) are less likely to do that in my experience.

If they DO do that, it's on us to tell them to undo that and fix things properly.


This is why you keep CLAUDE.md updated, there it’ll write down what is where and other relevant info about the project.

Then it doesn’t need to feel (or rg) through the whole codebase.

You also use plan mode to figure out the issue, write the implementation plan in a .md file. Clear context, enter act mode and tell it to follow the plan.


Can probably give access to tools like ast-grep to Claude. Will help it see all references. I still agree some dynamic references might still be left. Only way is to prompt well enough. Since I tested this on a Ruby on Rails codebase, I dealt with this.

Agree. It keeps getting closer to "I've had a negative experience with the internet ..."

I'm not convinced that "we know they work really well for some people." So far I just see people really excited about the potential and really impressed at what it's capable of, but I think people are extrapolating poorly. It's like, yes it's impressive that it can make a video game with a few prompts, but that doesn't mean that with a few more prompts it'll turn into a AAA game.

I'm on board with some limited AI autocompletion, but so far agents just seem like gimmicks to me.


If we handwave that the popular game Wordle, which made a lot of money for its author, could have been vibecoded, at what point does the gimmick become an actual feature that people look and pay for?

No shade at wordle, but what you're describing sounds like it would be useful for the shovelware industry and that's about it. Not exactly a great leap forward for humanity...

Although I should be fair, this can help with one-off scripts that research folks usually do, when you just need to plot some data or do some back-of-the-terminal math. That said I don't think this would be a game changer, more of an efficiency boost and a limited one at that.


What would a great leap forwards for humanity look like? Sure, making it easier to shovel out shovelware means more shovelware, but why is that a bad thing? If customers have a very specific problem that wasn't going to get solved because it was too expensive to build a custom solution, and they now get to have bespoke software to cure their ills, other than being judgemental about this hypothetical piece of software as being shovelware, why is that a bad thing?

Here's one version of what a great leap forward could look like, but it's simply one of many: an LLM that understand the CPU it's running on and can turn prompts into assembly, taking full advantage of the hardware. Or maybe it could target a virtual CPU like Java, but the point is that if the LLM can write code, why do it in Python or C? Just let it understand the CPU and let it rip. The only reason we have C/Python/etc. in the first place is because assembly sucks for humans to work with.

As to the shovelware, if it benefits people that's great, and I think the net benefit will likely be positive, but only slightly. The point in calling it shovelware is to suggest that it's low quality, and so it could have bugs and other performance issues that add costs to using which subtract from the benefit it provides (possibly in a net positive way, but probably not as fundamentally game changing as, say, Docker).


Seconded, that a summary description of your problem, codebase, programming dialect in use, should be included whenever a “<Model> didn’t work for me” response.

I find it telling that I have (mostly) good experiences with the GPT family and (mostly) bad experiences with the Claude family.

I just wish I could figure out what it tells. Their training data can't be that different. The problems I'm feeding them are the same. Many people think Claude is the more capable of the two.

It has to be how I'm presenting the problems, right? What other variable is there?


If you have been using GPT for awhile it simply may know more about you.

> But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least ...

I use Claude many times a day, I ask it and Gemini to generate code most days. Yet I fall into the "I've never included a line of code generated by an LLM in committed code" category. I haven't got a precise answer for why that is so. All I can come up with is the code generated lacks the depth of insight needed to write a succinct, fast, clear solution to the problem someone can easily understand in in 2 years time.

Perhaps the best illustration of this is someone proudly proclaimed to be they committed 25k lines in a week, with the help of AI. In my world, this sounds like they are claiming they have a way of turning the sea into ginger beer. Gaining the depth of knowledge required to change 25k lines of well written code would take me more than a week of reading. Writing that much in a week is a fantasy. So I asked them to show me the diff.

To my surprise, a quick scan of the diff revealed what the change did. It took me about 15 minutes to understand most of it. That's the good news.

The bad news it that 25k lines added 6 fields to a database. 2/3's were unit tests, perhaps 2/3's of the remainder was comments (maybe more). The comments were glorious in their length and precision, littered with ASCII art tables showing many rows in the table.

Comments in particular are a delicate art. They are rarely maintained, so they can bit rot in downright misleading babble after a few changes. But the insight they provide into what author was thinking, and in particular the invariants he had in mind can save hours of divining it from the code. Ideally they concisely explain only the obscure bits you can't easily see from the code itself. Anything more becomes technical debt.

Quoting Woodrow Wilson on the amount of time he spent preparing speeches:

    “That depends on the length of the speech,” answered the President. “If it is a ten-minute speech it takes me all of two weeks to prepare it; if it is a half-hour speech it takes me a week; if I can talk as long as I want to it requires no preparation at all. I am ready now.”
Which is a round about way of saying I suspect the usefulness of LLM generated code depends more on how often a human is likely to read it, than of any of the things you listed. If it is write once, and the requirement is it works for most people in the common cases, LLM generated code is probably the way to go.

I used PayPal's KYC web interface the other day. It looked beautiful, completely inline with the rest of PayPal's styling. But sadly I could not complete it because of bugs. The server refused to accept one page, it just returned to the same page with no error messages. No biggie, I phoned support (several times, because they also could not get past the same bug), and after 4 hours on the phone the job was done. I'm sure the bug will be fixed a new contractor. He spend an few hours on it, getting an LLM to write a new version, throwing the old code away, just as his predecessor did. He will say the LLM provided a huge productivity boost, and PayPal will be happy because he cost them so little. It will be the ideal application for an LLM - got the job done quickly, and no one will read the code again.

I later discovered there was a link on the page that allowed me to skip past the problematic page, so I could at least enter the rest of the information. It was in a thing that looked confusingly like a "menu bar" on the left, although there was no visual hit any of the items in the menu were clickable. I clicked on most of them anyway, but they did nothing. While on hold for phone support, I started reading the HTML and found one was a link. It was a bit embarrassing to admit to the help person I hadn't clicked that one. It sped the process up somewhat. As I said, the page did look very nice to the eye, probably partially because of the lack of clutter created by visual hints on what was clickable.

[0] https://quoteinvestigator.com/2012/04/28/shorter-letter/


There are some tasks that it can fail and not, but a lot of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase." IMO is "i know how to use it" vs "I don't know how to use it" with a side of "I have good test coverage" vs "tests?"

do you create the claude.md files at several levels of your folder structure, so you can teach it how to do different things? Configuring these default system prompts is required to get it to work well.

I'd definitely watch Boris's intro video below [1]

[1] Boris introduction: https://www.youtube.com/watch?v=6eBSHbLKuN0 [2] summary of above video: https://www.nibzard.com/claude-code/


By the time you do all of that you might as well just write code by hand.

that's really just a scale question.

Yes, I would write a 4 line bash script by myself.

But if you're trading a 200 line comprehensive claude.md document for a module that might be 20k LoC? it's a different value proposition.


And are you willing to stand behind those 20k loc? Like, whoever you're submitting it to, you can say "this is my work, it is done to a level of quality I find acceptable"?

And how do you actually know that those 20k line of codes have no glaring bugs, or bugs that you can find yourself, or be able to understand it completely at some point?

How do you know your own handwritten 20k lines of code have no bugs, or that 20k lines of code written by coworkers have no bugs?

I'm not the person you're replying to, but I have a lot more confidence in my own 20k lines of code than an AIs. I've built up skills to write performant, readable, functional, maintainable code. I build it up slowly and I can anticipate bugs as I write. I'm not perfect, but when bugs do arise, since I've built up the code, I have some idea of where to look and where not to look in order to fix them.

As for coworkers, I would really try to get them to work in chunks smaller than 20k loc. But at some point you have an expectation that coworkers will be accountable for their area of responsibility. If there's a bug in their code, they're expected to fix it. If there's a bug in the AIs code, I'm expected to fix it....


The way I do this is by still writing tests.

Do tests let you understand a codebase you have not written?

this desire to understand code will be soon be seen as rather anachronistic. What's important is that you understand your tests. Let the AI generate the code.

The spec and the test are your human contribution.


I understand your point of view but I think it's too "optimistic", i.e. it will not happen soon, at least not outside AI maximalists.

If the tests are written with sufficient detail that you don't need to look at the code, the implementation of the code is such a small part of the overall work that you are gaining very little in terms of overall productivity.

I agree

you're describing TDD and it never turned into the panacea that was promised. I'm excited to try claude code, i even have a decent little personal project lined up for it but someone somewhere will always need to understand the code because tests are never 100% exhaustive and major flaws come up.

Ah yes, can't wait to tell my auditor / regulator "I don't understand the code because Claude wrote it, but it's fine, because understand the code is for boomers." That will get a big laugh in a deposition.

That'll be anachronistic too obviously. Your tests will be audited.

I would say yes.

To have useful tests, you must write the APIs for the functions, and give examples of how to wire up the various constructs, and correct input/output pairs.

Implementations of those functions that pass the test now have significant constraints that mean you understand a lot about it.


That’s called Test Driven Development.

First you write the tests, then you write code until tests pass.


They can. Particularly if you use them to validate your assumptions about the code.

You don't do it manually. You have claude do it once you’ve guided it back on track to remind itself not to do it next time.

I think you are perhaps missing the point. Investing into these techniques [2] enables you to do unhinged things. Such as building a compiler whilst you are AFK [1].

[1] https://x.com/i/broadcasts/1OyJALVOnEzGb

[2] https://ghuntley.com/ralph


Sure if I want to just toy around for fun.

Those are cool, but a production system is infinitely more complex.


What do you define as a production system? Are you aware that one can generate TLA+ specifications, then code generate from these specifications and assert that the implementation matches the TLA+ spec?

You can tell Claude to verify its work. I’m using it for data analysis tasks and I always have it check the raw data for accuracy. It was a whole different ballgame when I started doing that.

Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.


> You can tell Claude to verify its work

Definitely - with ONE pretty big callout. This only works when a clear and quantifiable rubric for verification can be expressed. Case in point, I put Claude Code to work on a simple react website that needed a "Refresh button" and walked away. When I came back, the button was there, and it had used a combination of MCP playwright + screenshots to roughly verify it was working.

The problem was that it decided to "draw" a circular arrow refresh icon and the arrow at the end of the semicircle was facing towards the circle centroid. Anyone (even a layman) would take one look at it and realize it looked ridiculous, but Claude couldn't tell even when I took the time to manually paste a screenshot asking if it saw any issues.

While it would also be unreasonable to expect a junior engineer to hand-write the coordinates for a refresh icon in SVG - they would never even attempt to do that in the first place realizing it would be far simpler to find one from Lucide, Font Awesome, emojis, etc.


In general, using your own symbol forms for interactions rather than taking advantage of people’s existing mental models is a bad idea. Even straying from known libraries is shaky unless you’re a competent enough designer to understand what specific parts of a visual symbol signify that specific idea/action, and to whom. From a usability perspective, you’re much better off not using a symbol at all than using the wrong one.

I second this and would add that you really need an automated way to do it. For coding, automated test suites go a long way toward catching boneheaded edits. It will understand the error messages from the failed tests and fix the mistakes more or less by itself.

But for other tasks like generating reports, I ask it to write little tools to reformat data with a schema definition, perform calculations, or do other things that are fairly easy to then double-check with tests that produce errors that it can work with. Having it "do math in its head" is just begging for disaster. But, it can easily write a tool to do it correctly.


> Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.

That's exactly what I learned. In the early 2000's, from three expensive failed development outsourcing projects.


For me it fixed a library compatibility issue with React 19 in 10 mins and several nudges startign from the console error and library name.

It would have been a half-day worth of adventure at least should i have done it myself (from diagnosing to fixing)


This has a lot to do with how you structure your codebase; if you have repeatable patterns that make conventions obvious, it will follow them for the most part.

When it drops in something hacky, I use that to verify the functionality is correct and then prompt a refactor to make it follow better conventions.


I have seen both success and failure. It's definitely cool and I like to think of it as another perspective for when I get stuck or confused.

When it creates a bunch of useless junk I feel free to discard it and either try again with clearer guidelines (or switch to Opus).


> take several minutes to do something

The quality of the generated code is inversely proportional to the time it take to generate it. If you let Claude Code work alone for more than 300 seconds you will receive garbage code. Take that as a hint, if it can't finish the task in this time it means you are asking too much. Break up your feature and try with a smaller feature.


> I go in and debug for a while because the app has become fubar, then finally realize it did the whole thing incorrectly and throw it all away.

This seems consistent with some of the more precocious junior engineers I've worked with (and have been, in the past.)


Have you tried vibing harder?

Yeah that is kind of my experience as well. And - according to the friend who highly recommended it - I gave it a task that is "easily within its capabilities". Since I don't think I'm being gaslighted, I suspect it's me using it wrong. But I really can't figure out why. And I'm on my third attempt now..

Just got it at work today and it’s a dramatic step change beyond Cursor despite using the same foundation models. Very surprising! There was a task a month ago where AI assistance was a big net negative. Did the same thing today w/ Claude Code in 20ish minutes. And for <$10 in API usage!

Much less context babysitting too. Claude code is really good at finding the things it needs and adding them to its context. I find Cursor’s agent mode ceases to be useful at a task time horizon of 3-5 minutes but Claude Code can chug away for 10+ minutes and make meaningful progress without getting stuck in loops.

Again, all very surprising given that I use sonnet 4 w/ cursor + sometimes Gemini 2.5 pro. Claude Code is just so good with tools and not getting stuck.


Cool! If you're on pro, you can use a _lot_ of claude code without paying for API usage, btw.

Even though it’s the same model cursor adds a massive system prompt to every request. And it’s shit and lobotomises the models. After the rug pull I’m exclusive Claude code at the end of my billing period or when cursor cut me off the $60 a month plan—-which will probably come first—-a bit over halfway into my month.

> It’s the first time it really does feel like working with a junior engineer to me.

I have mixed feelings; because this means there’s really no business reason to ever hire a junior; but it also (I think) threatens the stability of senior level jobs long term, especially as seniors slowly lose their knowledge and let Claude take care of things. The result is basically: When did you get into this field, by year?

I’m actually almost afraid I need to start crunching Leetcode, learning other languages, and then apply to DoD-like jobs where Claude Code (or other code security concerns) mean they need actual honest programmers without assistance.

However, the future is never certain, and nothing is ever inevitable.


It's a junior engineer that doesn't learn - they make the same mistakes even after being corrected the second that falls out their context window (even often with "corrections" still there...), they struggle to abstract those categories of mistakes to avoid making similar ones in the future, and (by the looks of it) will never be "the senior". "Hiring a Junior" should really be seen as an investment more than immediate output.

I keep being told that $(WHATEVER MODEL) is the greatest thing ever, but every time I actually try to use them they're of limited (but admittedly non-zero) usefulness. There's only so many breathless blogs or comments I can read that just don't mesh with the reality I personally see.

Maybe it's sector? I generally work on Systems/OS/Drivers, large code bases in languages like C, C++ and Rust. Most larger than context windows even before you look at things like API documentation. Even as a "search and summarizer" tool I've found it completely wrong in enough cases to be functionally worthless as the time required to correct and check the output isn't a saving. But they can be handy for "autocompletion+" - like "here's a similar existing block of code, now do the same but with (changes)".

They generally seem pretty good at being like a template engine on non-templated code, so thing like renaming/refactoring or similar structure recognition can be handy. Which I suspect might also explain some of those breathless blog posts - I've seen loads which say "Any non-coder can make a simple app in seconds!" - but you could already do that, there's a million "Simple App Tutorial" codebases that would match whatever license you want, copy one, change the name at the top and you're 99% of the way to the "Wow Magic End Result!" often described.


> because this means there’s really no business reason to ever hire a junior

aren't these people your seniors in the coming years? Its healthy to model an inflow and outflow.


The pipeline dries up when orgs would rather get the upfront savings of gen AI productivity gains versus invest in talent development.

We are using probabilistic generators to output what should be deterministic solutions.

You know what else is probabilistic? You and me. That's why we have tooling in place to mitigate that and to constrain our variable outputs into more reliable, deterministic results. And luckily, a lot of that tooling can be used for probabilistic machines as well.

> DoD-like jobs where Claude Code (or other code security concerns) mean they need actual honest programmers without assistance.

Then they'll just get a contract to spin up a DoD-secure variant: https://www.anthropic.com/news/anthropic-and-the-department-...


DoD will probably be requiring use of Mechahitler soon enough.

Could you elaborate a bit on the tasks,languages,domain etc you’re using it with?

People have such widely varying experiences and I’m wondering why.


I find it pretty interesting that it's a roughly 2,500 word article on "using Claude Code" and they never once actually explain what they're using it for, what type of project they're coding. It's all just so generic. I read some of it then realize that there was absolutely no substance in what I just read.

It's also another in my growing list of data points towards my opinion that if an author posts meme pictures in their article, it's probably not an article I'm interested in reading.


Yeah I got about half way through before thinking "wow theres no information in this" and giving up.

It’s always telling when people don’t show their work. I’m not saying LLMs can’t do a good job but if you’re not even explaining the steps you used or showing the code that was generated or fixed then I have to assume what was really produced was unmaintainable spaghetti code that just happened to compile.

I literally cannot share my code, as that belongs to the company or the client. So what would I show? My prompts? Pseudo code? Unless you're an open source developer or build personal projects, requests for "show your code, bro" are hard to satisfy.

That's fine, but then don't write an article about it if you can't show the code. The vagueness just makes the article look unsupported by facts.

Always good to be cognizant that there are MANY people out there, especially on HN/YC circles, with large vested interests in LLM tooling. Just check out the YC batches lately, you'll be hard pressed to find a single one that doesn't mention AI or LLMs in some way.

It's always POC apps in js or python, or very small libraries in other popular languages with good structure from the start. There are ways to make them somewhat better in other cases (automated testing/validation/linting being a big one), but for the type of thing that 95% of developers are doing day to day (working on a big, sprawling code base where none of those attributes apply), it's not close to being there.

The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.

LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.


I like them for refactoring and “explain this massive codebase please”. Basically polishing or investigating things that already work.

But I think we should expect the scope of LLM work to improve rapidly in the next few years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...


The bad news is that mostly, as far as we can see, that doubling of performance also requires (at least) doubling of resource usage, plus we're getting close to a point where planetary resources for doubling LLM resources are getting kind of low...

This species is going extinct. I finally accepted that when my dad died rather than change his lifestyle, despite being warned 10000x. My mom survived a heart attack, saw what happened to my dad, still hasn't changed her lifestyle.

Hmm, I got Claude Opus to build me a game in Rust. I don’t think it really counts as POC app any more at that point.

It absolutely counts as a POC app until it's production grade, deployed, being used by people, maintained over time, etc.

This doesn't mean that it's not useful, or that you shouldn't be happy with what the LLM built. I also had Claude Code build me a web app for my own personal use in Rust this week. It's very useful to me. But it is 100% of POC/MVP quality, and always will be, because the code that it created is abjectly awful and I would never be able to scale it into a real world service without rewriting 50+% of it.


> They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

Sorry, but this is just not true.

I'm using agents with a totally idiosyncratic code base of Haskell + Bazel + Flutter. It's a stack that is so quirky and niche that even Google hasn't been able to make it work well despite all their developer talent and years of SWEs pushing for things like Haskell support internally.

With agents I'm easily 100x more productive than I would be otherwise.

I'm just starting on a C++ project, but I've already done at least 2 weeks worth of work in under a day.


I’m going to ask what I’ve asked the last person here who said they are “10-20x” more productive:

If you’re really that more productive, why don’t you quit your job and vibecode 10 ios apps (in your case that would be 50 to 100 proportionally)


Because money? Even if you can quickly build them it’s pointless if you can’t sell them. And Claude cannot help with that.

Share the codebase and what you're doing or, I'm sorry, you're just another example of what I laid out above.

If you honestly believe that "agents" are making you better than Goole SWEs then you severely need to take a step back and reevaluate, because you are wrong.


Hold the phone. So, Google, with its legions of summa cum laude engineers, can't make this stack work well, but your AI agent is nailing it into next week? Seriously, show me the way, so I too may find AI enlightenment.

What do you mean “with agents”?

I've been using mainly gemini-cli and am starting to play around with claude code.

Are you referring to those as agents or do you mean spinning separate/multiple agents out of sessions on them?

When I read these LLM in coding discussions, I'm reminded a lot of online dating discussions. Someone will post "I'm really having a tough time dating. I tried X (e.g. a dating app) but I had a tough experience." Someone will respond "I tried X and had great success. Highly recommended." Seems confounding, but when you click into their profiles to see pictures of each person, it becomes abundantly clear why these people report different experiences.

Not to dog the author too hard, but a look at their Github profile says a lot about the projects they've worked on and what kind of dev they are. Not much there in terms of projects or code output, but they do have 15k followers on Twitter, where they post frequently about LLMs to their audience.

They aren't talking about the tasks and the domains they're using because that's incidental; what they really want to do is just talk about LLMs to their online audience, not ship code.


I’m a TALL developer, so Laravel, Livewire, Tailwind, Alpine.

It’s nice because 3/4 of those are well-known but not “default” industry choices and it still handles them very well.

So there’s a Laravel CRM builder called Filament which is really fun to work in. Claude does a great job with that. It’s a tremendous amount of boilerplate with clear documentation, so it makes sense that Claude would do well.

The thing I appreciate though is that CC as an agent is able to do a lot in one go.

I’ve also hooked CC up to a read-only API for a client, and I need to consume all of the data on that API for the transition to a Filament app. Claude is currently determining the schema, replicating it in Laravel, and doing a full pull of API records into Laravel models, all on its own. It’s been running for 10 minutes with no interruption and I expect will perform flawlessly at that.

I invest a lot of energy in prompt preparation. My prompts are usually about 200 words for a feature, and I’ll go back and forth with an LLM to make sure it thinks it’s clear enough.


I haven't had great luck with Claude writing Windows Win32 (using MFC) in C++. It invents messages and APIs all the time that read like exactly what I want it to do.

I'd think Win32 development would be something AIs are very strong at because it's so old, so well documented, and there's a ton of code out there for it to read. Yet it still struggles with the differences between Windows messages, control notification messages, and command messages.


I opinion is that the AI is the distilled average of all the code it can scrape. For the stuff I'm good at and work on every day it doesn't help much beyond some handy code completions. For stuff I'm below average at like bash commands and JS it helps me get up to average. The most valuable to me is if I can use it to learn something - it gives some good alternatives and ideas if you have something mainstream.

The reason is probably complexity and the task at hand.

In my experience, LLMs are great at small tasks (bash or python scripts); good at simple CRUD stuff (js, ts, html, css, python); good at prototyping; good at documentation; okay at writing unit tests; okay at adding simple features in more complex databases;

Anything more complex and I find it pretty much unusable, even with Claude 4. More complex C++ codebases; more niche libraries; ML, CV, more mathsy domains that require reasoning.


I've had the same experience, although I feel like Claude is far more than a junior to me. It's ability to propose options, make recommendations, and illustrate trade-offs is just unreal.

Does anyone have any usage guides they can recommend to feel this way about using Claude code, other than the OP article? I fired it up yesterday for about an hour and tried it on a couple tickets and it felt like a total waste of time. The answers it gave were absurdly incorrect - I was being quite specific in my prompting and it seemed to be acquiring the proper context, but just doing nothing like what I was asking.

E.g. I asked it to swap all on change handlers in a component to modify a use State rather than directly fire a network request, and then add on blurs for the actual network request. It didn't add use states and just added on blurs that sent network requests to the wrong endpoint. Bizarre.


> It’s the first time it really does feel like working with a junior engineer to me.

I feel like working with Claude is what it must feel like for my boss to work with me. “Look, I did this awesome thing!”

“But it’s not what I asked for…”


Unlike a junior engineer, its feelings don’t get hurt when you ask for a redo

I liked Claude Code when I used it initially to document a legacy codebase. The developer who maintains the system reviewed the documentation, and said it was spot-on.

But the other day I asked it to help add boundary logging to another legacy codebase and it produced some horrible, duplicated and redundant code. I see these huge Claude instruction files people share on social media, and I have to wonder...

Not sure if they're rationing "the smarts" or performance is highly variable.


I agree. I also recommend people read this: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

There are some things in there that really take this from an average tool to something great. For example, a lot of people have no idea that it recognizes different levels of reasoning and allocates a bigger number of “thinking tokens” depending on what you ask (including using “ultrathink” to max out the thinking budget).

I honestly think that people who STILL get mostly garbage outputs just aren’t using it correctly.

Not to mention the fact that people often don't use Opus 4 and stay with Sonnet to save money.


half the posts on hackernews is same discussion over and over about coding agent usefulness or lack of

> it really does feel like working with a junior engineer to me.

I agree. It reminds me of this one junior engineer I worked with who produced awful code, and it would take longer to explain stuff to him than to just do it myself, let alone all the extra time I had to spend reviewing his awful PRs. I had hoped he would improve over time, but he took my PR comments personally and refused to keep working with me. At least Claude doesn't have an attitude.


Like working with an incredibly talented and knowledgable junior engineer, but still a junior engineer.

If you want to try something better than claude code, try Cline.


Can you explain how cline is better?

I love the interface. It makes it extremely easy to rewind time to undo code edits and rewinding the LLM context at the same time. It's prompting and toolset is great. It's got MCP which I have integrated into my workflow. It's got a solid marketplace of auto installing MCP services. I love it.

I found cursor much better than Claude Code. Running Claude code it did so many commands and internal prompting to get a small thing done and ate up tonnes of my quota. Cursor on the other hand did it super quick and straight to the point. Claude code just got stuck in grep hell

I agree with the comparison to steroids, but then I've seen people go through the health issues caused by steroids so we might mean different things by that comparison.

in what sense, instead of doing your job which I assume you've been doing successfully for many years you now ask Claude to do it for you and then have to review it?

Are you doing anything useful? How can anyone outside of yourself know this?

My own experiments only show that this technology is unreliable.


I am loving the Zed editor and they integrate Claude primarily so I might give it a shot.

Almost feels like a game as you level up!

Just wait til the honeymoon period ends and you actually have to stand behind that slop you didn't realize you were dumping into your codebase.

How are you guys happy with an 80-s looking terminal interface is beyond me...

If Claude is so amazing, could Anthropic not make their own fully-featured yet super-performant IDE in like a week?


Free yourself from the shackles of the GUI.

Free yourself from the shackles or displays too? Back to punched cards?



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: