Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was playing about with Chat GPT the other day, uploading screen shots of sheet music and asking it to convert it to ABC notation so I could make a midi file of it.

The results seemed impressive until I noticed some of the "Thinking" statements in the UI.

One made it apparent the model / agent / whatever had read the title from the screenshot and was off searching for existing ABC transcripts of the piece Ode to Joy.

So the whole thing was far less impressive after that, it wasn't reading the score anymore, just reading the title and using the internet to answer my query.



Yes I have found that grok for example actually suddenly becomes quite sane when you tell it to stop querying the internet And just rethink the conversation data and answer the question.

It's weird, it's like many agents are now in a phase of constantly getting more information and never just thinking with what they've got.


but isn't it what we wanted? we complained so much that LLM uses deprecated or outdated apis instead of current version because they relied so much on what they remembered


To be clear, what I mean is that grok will query 30 pages and then answer your question vaguely or wrongly and then ask for clarification of what it meant and then it goes and requeries everything again ... I can imagine why it might need to revisit pages etc and it might be a UI thing but it still feels like until you yell at it to stop searching for answers to summarise it doesn't activate it's "think with what you got" mode.

I guess we could call this gathering and then do your best conditional on what you found right now.


2010's: Google Search is making humans who constantly rely on it dumber

2020's: LLMs are making humans who constantly rely on them dumber

2026: Google Search is making LLMs who constantly rely on it dumber


Touché, that is what we humans are doing to some degree as well.


Sounds pretty human like! Always searching for a shortcut


It sounds like it's lying and making stuff up, something everybody seems to be okay with when using LLMs.


I am not sure why...you want the LLM to solve problems not come up with answers itself. It's allowed to use tools, precisely because it tends to make stuff up. In general, only if you're benchmarking LLMs you care about whether the LLM itself provided the answer or it used a tool. If you ask it to convert the notation of sheet music it might use a tool, and it's probably the right decision.


The shortcut is fine if it's a bog standard canonical arrangement of the piece. If it's a custom jazz rendition you composed with an odd key changes and and shifting time signatures, taking that shortcut is not going to yield the intended result. It's choosing the wrong tool to help which makes it unreliable for this task.


For structured outputs like that wouldn’t it be better to get the LLM to create a script to repeatably make the translation?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: