Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Codex app illustrates the shift left of IDEs and coding GUIs (benshoemaker.us)
74 points by straydusk 12 hours ago | hide | past | favorite | 162 comments




I find so many of these comments and debates fascinating as a lay person. I'm more tech savy than mostI meet, built my own PCs, know my way around some more 'advanced' things like terminal a bit and have a deeper understanding of computer systems, software, etc. than most people I know. It has always been more of a hobby for me. People look at me as the 'tech' guy even though I'm actually not.

Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.

For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.

This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.

For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.


Keep building and keep learning, I think you are the kind of user that stands to benefit the most from this technology.

After 10+ years of stewing on an idea, I started building an app (for myself) that I've never had the courage or time to start until now.

I really wanted to learn the coding, the design patterns, etc, but truthfully, it was never gonna happen without a Claude. I could never get past the unknown-unknowns (and I didn't even grasp how broad is the domain of knowledge it actually requires.) Best case I would have started small chunks and abandoned it countless times, piling on defeatism and disappointment each time.

Now in under two weeks of spare time and evenings, I've got a working prototype that's starting to resemble my dream. Does my code smell? Yes. Is it brittle? Almost certainly. Is it a security risk? I hope not. (It's not.)

I want to be intentional about how I use AI; I'm nervous about how it alters how we think and learn. But seeing my little toy out in the real world is flippin incredible.


>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

I can’t imagine any other example where people voluntarily move for a black box approach.

Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?


I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).

Your analogy to PHP developers not reading assembly got me thinking.

Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.

There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.


> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.

But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!

For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.

(In the case of Windows 11 recently.. not so much ;)


> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.

In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.

The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)


For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Put another way, if you don't know what correct is before you start working then no tradeoff exists.


> Put another way, if you don't know what correct is before you start working then no tradeoff exists.

This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.

"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.


> This goes out the window the first time you get real users, though.

Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.

Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".


It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.

Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.


OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".

Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.

Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.

> which is faithfully translated by the (hopefully bug-free) compiler.

"Hey Claude, translate this piece of PHP code into Power10 assembly!"


I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.

I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.


> that is they are describing a new type of programmer that will only use agents and never read the underlying code

> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly

This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.

Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?


That is true for all languages. Very high quality until you use a lib, a module or an api.

> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

The output of code isn't just the code itself, it's the product. The code is a means to an end.

So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.


>The output of code isn't just the code itself, it's the product. The code is a means to an end.

I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?

If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code


I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.

>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions

So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.

Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.

And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?


There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.

"Testing ladders" is a great framing.

My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.


Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?

I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.


People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.

Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.

The product is: solving a problem. Requirements vary.

A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.

Specious analogies don't help anything.


The output is the program behavior. You use it, like a user, and give feedback to the coding agent.

If the app is too bright, you tweak the settings and build it again.

Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.

Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.


AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.

> What is the logic here?

It is right often enough that your time is better spent testing the functionality than the code.

Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).


I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.

> I can’t imagine any other example where people voluntarily move for a black box approach.

Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.


>At some point you have to let go and trust people‘s judgement.

Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.

>Reading and understanding the whole output of 9 concurrently running agents is impossible.

I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?

I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.


Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?

An AI agent cannot be held accountable

Neither can employees, in many countries.

> Anyone overseeing work from multiple people has to?

That's not a black box though. Someone is still reading the code.

> At some point you have to let go and trust people‘s judgement

Where's the people in this case?

> People who do that (I‘m not one of them btw) must rely on higher level reports.

Does such a thing exist here? Just "done".


> Someone is still reading the code.

But you are not. That’s the point?

> Where's the people in this case?

Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.

> Does such a thing exist here? Just "done".

Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.


Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!

You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.

My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.

One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.

What about writing the actual code yourself

Already happening in the wild

> I can’t imagine any other example where people voluntarily move for a black box approach.

I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.


So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.

You can get an AI to listen to that bass solo for you

No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.

To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.

My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.


I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).

I personally think we are not that far from it, but it will need something built on top of current CLI tools.


> Because if you can read code, I can’t imagine poking the result with black box testing being faster.

I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.


Good analogy.

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

It's producing seemingly working code faster than you can closely review it.


Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

The AI also writes the black box tests, what am I missing here?


>The AI also writes the black box tests, what am I missing here?

If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.

In other words, if “the ai is checking as well” no one is.


That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.

etc...etc...

> In other words, if “the ai is checking as well” no one is.

"I tried nothing, and nothing at all worked!"


your metaphor is wrong.

code is not the output. functionality is the output, and you do look at that.


Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.

Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.


>Imagine taking a picture on autoshot mode

Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.

There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?


You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.

people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.

> I don’t read code anymore

Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.


> Never thought this would be something people actually take seriously

The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].

[1] https://www.linkedin.com/in/benshoemaker000/

[2] https://github.com/benjaminshoemaker

[3] https://www.benshoemaker.us/about


I've written code since 2012, I just didn't put it online. It was a lot harder, so all my code was written internally, at work.

But sure, go with the ad hominem.


Whilst I won't comment on this specific person, one of the best programmers I've met has a law degree, so I wouldn't use their degree against them. People can have many interests and skills.

> Never thought this would be something people actually take seriously.

You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).

AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.

It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.

I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.


This is gold.

I have rarely had the words pulled out of my mouth.

The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.


Well, there are programmers like Karpathy in his original coinage of vibe coding

> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Notice "don't read the diffs anymore".

In fact, this is practically the anniversary of that tweet: https://x.com/karpathy/status/2019137879310836075?s=20


Half serious - but is that really so different than many apps written by humans?

I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.

In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.

In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).

I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.

I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.


I would enjoy discussion with whoever voted this down - why did you?

What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?


> 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.

That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?


Remember though this forum is full of people who consider code objects when it's just state in a machine.

We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?

Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.

Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.

Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.

Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.

Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".

It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...


Wie bitte?

Reading and understanding code is more important than writing imo

It’s pretty well established that you cannot understand code without having thought things through while writing it. You need to know why things are written the way the are to understand what is written.

Yeah, just reading code does little to help me understand how a program works. I have to break it apart and change it and run it. Write some test inputs, run the code under a debugger, and observe the change in behavior when changing inputs.

If that were true, then only the person who wrote the code could ever understand it enough to fix bugs, which is decidedly not true.

I'm torn between running away to be an electrician or just waiting three years until everyone realises they need engineers who can still read.

Sometimes it feels like pre-AI education is going to be like low-background steel for skilled employees.


The coincidental timing between the rapid increase in the number of emergency fixes coming out on major software platforms and the proud announcement of the amount of code that's being produced by AI at the same companies is remarkable.

I think 2-3 years is generous.

Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.


If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.

The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.


> If the models don't get to the point where they can correct fixes on their own

Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.

(+) Except for making the UI beautiful. It's crap at that.

The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.

This extends up to higher level components where the spec also briefly explains the actual goal.

Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.

My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.


I actually think this is fair to wonder about.

My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.

I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?

I feel like time will validate that.


The proponents of Spec Driven Development argue that throwing everything out completely and rebuilding from scratch is "totally fine". Personally, I'm not comfortable with the level of churn.

I have wondered the same but for the projects I am completely "hands off" on, the model improvements have overcome this issue time and time again.

I'm 2-3 years from now if coding AI continues to improve at this pace I reckon people will rewrite entire projects.

I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.

But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!


If I keep getting married at the same pace I have, then in a few years I'll have like 50 husbands.

Well, Tesla has been nearly at FSD for how long? The analogy you make sorta makes it sound less likely

Seems dangerous to wager your entire application on such an uncertainty

Some people are not aware that they are one race condition away from a class action lawsuit.

Also take something into account: absolutely _none_ of the vibe coding influencer bros make anything more complicated than a single-feature, already implemented 50 times webapp. They've never built anything complicated either, or maintained something for more than a few years with all the warts that it entails. Literally, from his bio on his website:

> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.

He's a manager that made graphs on Power BI.

They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.

It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.

And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.


No true programmer would vibecode an app, eh?

Yes, and you can rebuild them for free

Claude, Codex and Gemini can read code much faster than we can. I still read snippets, but mostly I have them read the code.

Unfortunately they're still too superficial. 9 times out of 10 they don't have enough context to properly implement something and end up just tacking it on in some random place with no regard for the bigger architecture. Even if you do tell it something in an AGENT.md file or something, it often just doesn't follow it.

I use them to probabilistically program. They’re better than me and I’ve been at it for 16 years now. So I wouldn’t say they’re superficial at all.

What have you tried to use them for?


I've seen software written and architected by Claude and I'd say that they're already ready to be thrown out. Security sucks, performance will probably suck, maintainability definitely sucks, and UX really fucking sucks.

I have a wide range of Claude Code based setups, including one with an integrated issue tracker and parallel swarms.

And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.

Conclusion: if you want quality in anything big in February 2026, you still need to read the code.


Opus is too superficial for coding (great at bash though, on the flipside), I‘d recommend giving Codex a try.

As LLMs advance so rapidly I think that all the AI slop code written today will be easily digestible by the LLMs a few generations down the line. I think there will be a lot of improvements in making user intent clearer. Combined with a bad codebase and larger context windows, refactoring wont be a challenge.

The skills required to perform as a software engineer in an environment where competent AI agents is a commodity has shifted. Before it was important for us to be very good as reading documentation and writing code. Now we need to be very good at writing docs, specs and interfaces, and reading code.

That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.

And I think this is good. In time, we are going to be forced to think less technically and more semantically.


Sometimes when I vibe code, I also have a problem with the code, and find myself asking: “What went wrong with the system that produced the code?”

The answer is clear: I didn’t write the code, I didn’t read it, I have no idea what it does, and that’s why it has a bug.


That as it may be. I spot bugs a lot faster when I didn’t write the code than when I did.

Well, I’d wager there are quite a few more bugs, so naturally it should be easier to spot a few.

Following this logic, why not move further left?

Become a CTO, CEO or even a venture investor. "Here's $100K worth tokens, analyze market, review various proposals from Agents, invest tokens, maximize profit".

You know why not? Because it will be more obvious it doesn't work as advertised.


If one truly believed in LLMs being able to replace knowledge workers, then it would also hold that they could replace managers and execs. In fact, they should be able to do it even better: LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy and directly consuming meeting notes from every meeting to get the real status as the source of truth, and provide suggestions as needed. If combined with web-search capability, they would also be more plugged into the market, customer sentiment, and competitors than most execs could ever be.

We're not at the point where we are replacing all software developers entirely (and will never be without real AGI), but we are definitely at the point where scaling back headcount is possible.

Also, creating software is much more testable and verifiable than what a CEO does. You can usually tell when the code isn't right because it doesn't work or doesn't pass a test. How can you verify that your AI CEO is giving you the right information or planning its business strategy effectively?

It's one of the biggest reasons that software development and art are the two domains in which AI excels. In software you can know when it's right, and in art it doesn't matter if it's right.


You have to move up or down to survive. In 10 years we'll either be managers (either of humans or agents), or we'll be electrical engineers. Programming is done! I for one am glad.

There are two extremes and spectrum in between:

* AI can replace knowledge workers - most of existing software engineers, managers of all levels will loose their job and have to re-qualify.

* AI requires human in the loop.

In the first scenario, I see no reason to waste time and should start building plan B now (remaining job markets will be saturated at that point).

In the second scenario, tech-debt and zettabytes of slop will harm companies which relied on it heavily. In the age of failing giants and crumbling infrastructure, engineers and startups that can replace gigawatt burning data center with a few kilowatt rack, by manually coding a shell script that replaces Hadoop, will flourish.

Most probably it will be a spectrum - some roles can be replaced, some not.


I still think this is mostly people who never could hack it at coding taking to the new opportunities that these tools afford them without having to seriously invest in the skill, and basking in touting their skilless-ness being accepted as the new temporary cool.

Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.


Just FYI, this is the attitude that causes pro-AI people to start shit-talking anti-AI folks as Luddites who need to learn to use the tools.

Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.


I enjoy new technology in general, so I very much keep up with the tools and also like using them for the things they do well at any given moment. I'm not among the Luddites, FWIW. I think there's a lot of legitimately great building going on right now.

Note though we're talking about "not reading code" in context, not the writing of it.


Author is a former data analytics product manager (already a bit of a tea leaf reading domain) who says he never reads code and is now marketing himself as a new class of developer.

Parent post sounds like a very accurate description.


I completely agree in a sense - the cost of producing software is plummeting, and it's leading to me being able to develop things that I would never have invested months in before.

>I think the industry is moving left. Toward specs. The code is becoming an implementation detail. What matters is the system that produces it - the requirements, the constraints, the architecture. Get those right, and the code follows.

So basically a return to waterfall design.

Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).


I really wish posts like this explained what sort of development they are doing. Is this for an internal CRUD server? Internal React app? Scala server with three instances? Golang server with complex AWS configuration? 10k lines? 100k lines? 1M+? Externally facing? iOS app? Algorithm-heavy photo processing desktop app? It would give me a much better idea of whether the argument is reasonable, and whether it is applicable for the kind of software I generally write.

The author is a PM with a bachelors in economics who got laid off last year and began building with AI. Zero engineering experience.

You can guess what kind of software he is building.

When you read the 100th blog post about how AI is changing software development, just remember that these are the authors.


You're completely right and I wish I had in retrospect... I was honestly just talking mostly in broad terms, but people really (maybe rightly) focused on the "not reading code" snippet.

I'm mostly developing my own apps and working with startups.


This blog post is written by a product manager, not a programmer. Their CV speaks to an Economics background, a stint in market research, writing small scripting-type programs ("Cron+MySQL data warehouse") and then off to the product management races.

What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.

Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.


That's me! I'm pretty open about that.

I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.


> The people really leading AI coding right now (and I’d put myself near the front, though not all the way there)

So humble. Who is he again?



> I don’t read code anymore

> Senior Technical Product Manager

yeah i'd wager they didn't read (let alone write) much code to begin with..


At least going by their own CV, they've mostly written what sounds like small scripting-type programs described in grandiose terms like "data warehouse".

This blog post is influencer content.


Pretty unpopular influencer if that were the case

When I talk with people in the space, go to meetups, present my work & toolset, I am usually one of the more advanced, but usually not THE most, people in the conversation / group. I'm not saying I'm some sort of genius, I'm just saying I'm relatively near the leading edge of how to use these tools. I feel like it's true.

Why have a spec when I have the concrete implementation and a system ready and willing to answer any questions I have about it? I don't understand why people value an artifact that can be out of sync with reality over the actual reality. The LLM can answer questions based on the code. We might drift away from needing a code editor, but I likely won't be drifting to reading specs in a world where I can converse with the deployed implementation.

turning a big dial taht says "Psychosis" on one side and "Wishful thinking" on the other and constantly looking back at the LinkedIn audience for approval like a contestant on the price is right

I don't get it. Can't you just open Claude Code in another terminal? I had like 5 open yesterday.

I haven't used Codex though, so maybe there's something I'm missing about the parallel-ness of it here.


Spec is too low level in my experience. The graph continues far further to the left.

I tried doing clean room reimplementations from specs, and just ended up with even worse garbage. Cause it kept all the original garbage and bloated it further!

Giving it a description of what you're actually trying to do works way better. Then it finds the most elegant solution to the problem, both in terms of the code and the UI design.


Yeah, the revenge of waterfall, specs documents for AI agents.

reason why I also ended up creating something like this: https://github.com/saadnvd1/aTerm

> Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code.

Recently I picked a smallish task from our backlog. This is some code I'm not familiar with, frontend stuff I wouldn't tackle normally.

Claude wrote something. I tested, it didn't work. I explained the issue. It added a bunch of traces, asked me to collect the logs, figured out a fix, submitted the change.

Got bunch of linter errors that I don't understand, and that I copied and pasted to Claude. It fixed something, but still got lint errors, which Claude dismissed as irrelevant, but I realized I wasn't happy with the new behavior.

After 3 days of iteration, my change seems ok, passed the CI, the linters, and automatic review.

At that stage, I have no idea if this is the right way to fix the problem, and if it breaks something, I won't be able to fix it myself as I'm clueless. Also, it could be that a human reviewer tells me it's totally wrong, or ask me questions I won't be able to answer.

Not only, this process wasn't fun at all, but I also didn't learn anything, and I may introduce technical debt which AI may not be able to fix.

I agree that coding agents can boost efficiency in some cases, but I don't see a shift left of IDEs at that stage.


> This is some code I'm not familiar with

Ask it to analyze and explain the code to you.


Why not look at the code? If you see something that looks messy, ask for it to be cleaned up.

Code health is a choice. We have power tools now. All you have to do is ask.


a simple this seems odd / messy / un-pythonic is often enough

My rule is 3 tries then dig deeper. Sometimes I don't even wait that long, certain classes of bugs are easy for humans to detect but hard for agents, such as CSS issues. Try asking the agent to explain/summarize the code that's causing the problem and double checking against docs for the version you're using, that solves a lot of problems.

This has largely been my experience. Just reading and understanding the code, and writing the change myself ends up actually being faster.

>Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code. I don’t question the code. I either ask one of my coding agents, or - more often - I ask myself: what happened with my system? What can I improve about the inputs that led to that code being generated?

Good luck debugging any non trivial problem in such codebase


I don't like the craft of the app. There are a few moments that really left me feeling it wasn't 100 percent thought through like cursor is at this point.

Why create IDE without IDE features? Whats the benefit of this over using IDE with Codex plugin? I don't believe that you can review the code without code traversal by references, so looks like its directed towards toy projects/ noobs. And the agents are not yet near the autonomy that will omit the code review in complex systems.

Why do the illustrations bear such a strong resemblance to those in the Gas Town article?

https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

Is it a nano banana tendency or was it probably intentional?


It's nano banana - I actually noticed the same thing. I didn't prompt it as such.

Here's the prompt I used, actually:

Create a vibrant, visually dynamic horizontal infographic showing the spectrum of AI developer tools, titled "The Shift Left"

Layout: 5 distinct zones flowing RIGHT TO LEFT as a journey/progression. Use creative visual metaphors — perhaps a road, river, pipeline, or abstract flowing shapes connecting the stages. Each zone should feel like its own world but connected to the others.

Zones (LEFT to RIGHT):

1. "Specs" (leftmost) - Kiro logo, VibeScaffold logo, GitHub Spec Kit logo

   Label: "Requirements → Design → Tasks"


2. "Multi-Agent Orchestration" - Claude Code logo, Codex CLI logo, Codex App logo, Conductor logo

   Label: "Parallel agents, fire & forget"


3. "Agentic IDE" - Cursor logo, Windsurf logo

   Label: "Autonomous multi-file edits"


4. "Code + AI" - GitHub Copilot logo

   Label: "Inline suggestions"


5. "Code" (rightmost) - VS Code logo

   Label: "Read & write files"


Visual style: Fun, energetic, modern. Think illustrated tech landscape or isometric world. NOT a boring corporate chart. Use warm off-white background (#faf8f5) with amber/orange (#b45309) as the primary accent color throughout. Add visual flair — icons, small illustrations, depth, texture, but don't make it visually overloaded.

Aspect ratio: 16:9 landscape


> Where IDEs are headed and why specs matter more than code.

We are very far away from this being a settled or agreed upon statement and I really struggle to understand how one vendor making a tool is indicative of an industry practice.


Hell I see the big banner picture hallucinated by a prompt and all I see is an unproductive mess. Won't comment on the takes the article makes they're just miserable

Clearly written by someone who has no systems of importance in production. If my code fail people loose money, planes halts, cars break down. Read. The. Code.

Yes, but also ... the analogy to assembly is pretty good. We're moving pretty quickly towards a world where we will almost never read the code.

You may read all the assembly that your compiler produces. (Which, awesome! Sounds like you have a fun job.) But I don't. I know how to read assembly and occasionally do it. But I do it rarely enough that I have to re-learn a bunch of stuff to solve the hairy bug or learn the interesting system-level thing that I'm trying to track down if I'm reading the output of the compiler. And mostly even when I have a bug down at the level where reading assembly might help, I'm using other tools at one or two removes to understand the code at that level.

I think it's pretty clear that "reading the code" is going to go the way of reading compiler output. And quite quickly. Even for critical production systems. LLMs are getting better at writing code very fast, and there's no obvious reason we'll hit a ceiling on that progress any time soon.

In a world where the LLMs are not just pretty good at writing some kinds of code, but very good at writing almost all kinds of code, it will be the same kind of waste of time to read source code as it is, today, to read assembly code.


I think this analogy to assembly is flawed.

Compilers predictably transform one kind of programming language code to CPU (or VM) instructions. Transpilers predictably transform one kind of programming language to another.

We introduced various instruction architectures, compiler flags, reproducible builds, checksums exactly to make sure that whatever build artifact that's produced is super predictable and dependable.

That reproducibility is how we can trust our software and that's why we don't need to care about assembly (or JVM etc.) specifics 99% of the time. (Heck, I'm not familiar with most of it.)

Same goes for libraries and frameworks. We can trust their abstractions because someone put years or decades into developing, testing and maintaining them and the community has audited them if they are open-source.

It takes a whole lot of hand-waving to traverse from this point to LLMs - which are stochastic by nature - transforming natural language instructions (even if you call it "specs", it's fundamentally still a text prompt!) to dependable code "that you don't need to read" i.e. a black box.


The analogy to assembly is wrong. Even in a high level language, you can read the code and reason about what it does.

What's the equivalent for an LLM? The string of prompts that non-deterministically generates code?

Also, if LLM output is analogous to assembly, then why is that what we're checking in to our source control?

LLMs don't seem to solve any of the problems I had before LLMs existed. I never worried about being able to generate a bunch of code quickly. The problem that needs to be solved is how to better write code that can be understood, and easily modified, with a high degree of confidence that it's correct, performs well, etc. Using LLMs for programming seems to do the opposite.


I think it's the performative aspects that are grating, though. You're right that even many systems programmers only look at the generated assembly occasionally, but at least most of them have the good sense to respect the deeper knowledge of mechanism that is to be found there, and many strive to know more eventually. Totally orthogonal to whether writing assembly at scale is sensible practice or not.

But with the AI tools we're not yet at the wave of "sometimes it's good to read the code" virtue signaling blog posts that will make front page next year or so, and still at the "I'm the new hot shit because I don't read code" moment, which is all a bit hard to take.


I mean, fair enough. Obviously there are different levels of criticality in any production environment. I'm building consumer products and internal tools, not safety-critical systems.

Even in those environments, I'd argue that AI coding can offer a lot in terms of verification & automated testing. However, I'd probably agree, in high-stakes safety environments, it's more of a 'yes and' than an either/or.


I think a lot of AI bros are sleeping on quality. Prior startup wisdom was “move fast and break things”. Speed is ubiquitous now. Relatively anyone can vibe code a buggy solution that works for their happy path. If that’s the bar, why would I pay for your jank solution when I can make my own tailored to my exact needs? Going fast is a race to the bottom in the long run.

What’s worth paying for is something that is trustworthy.

Claude code is a perfect example: They blocked tools like opencode because they know quality is the only moat, and they don’t currently have it.


has someone figured out on how to set the codex app to yolo mode yet?

the constant asking drives me crazy


There‘s a button looking like a shield, next to the voice dictation button.

It's called the "Yeet" skill in the app

I have always thought that AI code generation is an irresistible attraction for those personalities who lack the technical skills or knowledge necessary for programming, but nevertheless feel undeservedly like geniuses. This post is proof of that.

Also, the generated picture in this post makes me want to kick someone in the nuts. It doesn't explain anything.


Ouch lol.

Is the image really not that clear? There are IDE-like tools that all are focusing on different parts of the Spec --> Agent --> Code continuum. I think it illustrates that all right.


I really wonder why nobody is talking about how it is more important to be able to test the code.

9/10 my ai generated code is bad before my verification layers 9/10 its good after.

Claude fights through your rules. And if you code in another language you could use other agents to verify code.

This is the challenge now, effectively verify the code. Whenever I end up with a bad response I ask myself what layers could i set to stop AI as early as possible.

Also things like namings, comments, tree traversal, context engineering, even data-structures, multi-agenting. I know it sounds like buzzword, but these are the topics a software-engineer really should think about. Everything else is frankly cope.


I think people attacking "don't read the code" are not considering the status quo - they're comparing to some perfect world where staff engineers are reading every line of code. That's not even close to happening. Test-driven development is something most engineers just won't put up with... AI's will do it, no problem. If I can automate ten different checks for every commit, is my code really getting looked at less?

Not really what "shift left" means...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: