Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problems with that skill is that:

* Most existing LLM interfaces are very bad at editing history, instead focusing entirely on appending to history. You can sort of ignore this for one-shot, and this can be properly fixed with additional custom tools, but ...

* By the time you refine your input enough to patch over all the errors in the LLM's output for your sensible input, you're bigger than the LLM can actually handle (much smaller than the alleged context window), so it starts randomly ignoring significant chunks of what you wrote (unlike context-window problems, the ignored parts can be anywhere in the input).



I really like Zed's (editor) implementation. The context window is just editable text, like any other. You can freely change anything and send the whole thing back into the LLM. I find that a much more useful interface than mucking around and editing chat bubbles.


ChatGPT basically lets you edit any of your messages at any point in the conversation, which I definitely use (e.g., if the conversation has gotten into a bad basin, the LLM misunderstood me, etc).

Also ChatGPT has a pretty big context window. Gemini supposedly has the biggest useful context window (~millions of tokens), though I don't have personal experience.


I tend to avoid editing previous messages because it breaks my mental model of the sequence that got me to the current state. That's more of a bias from my goal to do "research" into how these models work though - I'm always trying to maintain the cleanest possible record of what I did so I can learn from the transcript later.


> Most existing LLM interfaces are very bad at editing history, instead focusing entirely on appending to history. You can sort of ignore this for one-shot, and this can be properly fixed with additional custom tools, but ...

Somebody somewhere needs to provide a threaded interface to an LLM.


Yeah, a key thing to understand about LLMs is that managing the context is everything. You need to know when to wipe the slate by starting a new chat session and then pasting across a subset of the previous conversation.

A lot of my most complex LLM interactions take place across multiple sessions - and in some cases I'll even move the project from Claude 3.5 Sonnet to OpenAI o1 (or vice versa) to help get out of a rut.

It's infuriatingly difficult to explain why I decide to do that though!


What kinds of things do you with these LLMs?

I feel like I’m good at understanding context. I’ve been working in AI startups over the last 2 years. Currently at an AI search startup.

Managing context for info retrieval is the name of the game.

But for my personal use as a developer, they’ve caused me much headache.

Answers that are subtly wrong in such a way that it took me a week to realize my initial assumption based on the LLM response was totally bunk.

This happened twice. With the yjs library, it gave me half incorrect information that led me to misimplementing the sync protocol. Granted it’s a fairly new library.

And again with the web history api. It said that the history stack only exists until a page reload. The examples it gave me ran as it described, but that isn’t how the history api works.

I lost a week of time because of that assumption.

I’ve been hesitant to dive back in since then. I ask questions every now and again, but I jump off much faster now if I even think it may be wrong.


There is no substitute for cold hard facts. LLMs do not provide that unless it’s literally the easiest thing for them to do and even then not always.

In the case you were in I would go out of my way to feed the docs to the LLM and then use the LLM to interrogate the docs and then verify the understanding I got from the LLM with a personal reading of the docs that were relevant.

You might think it takes just as long of not longer to do it my way rather than just reading the docs myself. Sometimes it can. But as you get good at the workflow you find that the time sien finding the relevant docs goes down and you get an instant plausible interpretation of the docs added too. You can then very quickly produce application code right away and then docs of the code you write.


Here are a bunch of things I use LLMs for relating to code.

- Running micro-benchmarks (using Python in Code Interpreter) - if I have a question about which of two approaches is faster I often use this pattern: https://simonwillison.net/2023/Apr/12/code-interpreter/

- Building small ad-hoc one-off tools. Many of the examples in https://simonwillison.net/2024/Oct/21/claude-artifacts/ fit that bill, and I have a bunch more in my tools tag here: https://simonwillison.net/tags/tools/ - Geoffrey Litt wrote a great piece the other day about custom developer tools which matches how I think about this: https://www.geoffreylitt.com/2024/12/22/making-programming-m...

- Building front-end prototypes - I use Claude Artifacts for this all the time, if I have an idea for a UI I'll get Claude to spin up an almost instant demo so I can interact with it and see if it feels right. I'll often copy the code out and use it as the starting point for my production feature.

- DSLs like SQL, Bash scripts, jq, AppleScript, grep - I use these WAY more than I used to because 9/10 times Claude gives me exactly what I needed from a single prompt. I built a CLI tool for prompt-driven jq programs recently: https://simonwillison.net/2024/Oct/27/llm-jq/

- Ad-hoc sidequests. This is a pretty broad category, but it's effectively little coding projects which I shouldn't actually be working on at all but I'll let myself get distracted if an LLM can get me there in a few minutes: https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-cas...

- Writing C extensions for SQLite while I'm walking my dog on the beach. I am not a C programmer but I find it extremely entertaining that ChatGPT Code Interpreter, prompted from my phone, can write, compile and test C extension for SQLite for me: https://simonwillison.net/2024/Mar/23/building-c-extensions-...

- That's actually a good example of a general pattern: I use this stuff for exploratory prototyping outside of my usual (Python+JavaScript) stack all the time. Usually this leads nowhere, but occasionally it might turn into a real project (like this AppleScript example: https://til.simonwillison.net/gpt3/chatgpt-applescript )

- Actually writing code. Here's a Python/Django app I wrote almost entirely with Claude: https://simonwillison.net/2024/Aug/8/django-http-debug/ - again, this was something of a side-project - not something worth spending a full day on but worthwhile if I could get it done in a couple of hours.

- Mucking around with APIs. Having a web UI for exploring an API is really useful, and Claude can often knock those out from a single prompt. https://simonwillison.net/2024/Dec/17/openai-webrtc/ is a good example of that.

There's a TON more, but this probably represents the majority of my usage.


Thank you!

I’ll read through these and try again in the new year.


Not OP, but I've just gotten really used to verifying implementation details. Yup, those subtle ones really suck. It's pretty much just up to intuition if something in the response (or your followups) rings the `not quite right` bell for you.


I bought in early to typingmind, a great web based frontend. Good for editing context, and switching from say gemini to claude. This is a very normal flow for me, and whatever tool you use should enable this

also nice to interact with an LLM in vim, as the context is the buffer

obviously simon’s llm tool rules. I’ve wrapped it for vim




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: