What to do When AI Makes a Mistake

Cassia Scheffer

24 Jul 2025 — 2 min read

Reading Sean Goedecke’s post “Do not yell at the language model”, my first reaction was, “Don’t…what?”

I don’t yell at people. Why would I yell at a non-sentient tool? That’s like yelling at the sledgehammer you dropped on your foot.

There is a good bit of truth in his post:

So what should you do when an AI model makes a big mistake? I recommend correcting it as matter-of-factly as possible and trying to briskly move on. If you haven’t yet built up a lot of context in the conversation, it might be worth starting over entirely. The best approach - only offered by some AI tools - is to go back to the point in the conversation right before the mistake was made and head it off by updating your previous message.

Just start over! Treat mistakes as opportunities to set better guidelines. The quicker you are at catching mistakes, the more accurate the LLM will become.

Here are a few things I do when working with LLMs:

One small task per chat. I know some people build up extensive context in a long chat, but if I don’t catch a mistake, the % of incorrect context can compound over time.
When the LLM or I are stuck, ask it what we are stuck on and ask it to summarize possible solutions.
Try one solution at a time. Sometime in the same chat. Sometimes in a new chat. Depending on how long your chat has gone on.
Verify the proposed solutions and problem space by looking at the code. Sometimes you’ll catch something the LLM misunderstood. Other times, you’ll learn something you had misunderstood!
Find docs to help focus the LLM’s attention.
Write Docs. This is so important. Once you’ve solved something, write the documentation in a docs/ directory and reference it in later chats for similar problems.

Agents don’t have intent. They don’t even have all the context. They can miss things. Treat the agents’ work as directionally correct. Verify implementation details.

To work quickly with LLMs, you need to understand the language, system, and tools you’re working with so that you can get the agent back on track when it inevitably drifts.

One last note about agents making mistakes. We all make mistakes! The best part of working with an LLM is that it can make mistakes quickly, and you can redirect those mistakes. The faster the LLM makes mistakes, the quicker you can refine the task and complete the work.

Anticipating errors is as much a part of working with humans as it is working with AI.

Safe Claude Code Settings

Here are a few commands I have in my Claude Code deny-list to prevent bad things from happening. I wish Claude shipped with these by default! The following commands disallow Claude from force pushing and skipping pre-commit hooks. { "permissions": { "deny": [ "Bash(git push -f)"

Planning, Building, Reviewing: Why The JJ Workflow Works Better Than Git

After a week of experimenting with using git and jj to plan and build, I’ve landed on jj as the winner. Rebasing Kills the Git Workflow Automatic rebases are the winning feature here. When I tell Claude to update a change, it will automatically rebase all its descendants. This

Use Git to Plan and Build With Claude

When I shared my new JJ and Claude workflow last week, I wondered whether JJ was too much friction for developers. It requires learning a new version-control tool in addition to learning to use Claude. I had an idea in the back of my mind to use plain git for

Why is Claude Code Different from Cursor if they Both Use Claude?

I get this question often. Here’s how the story goes: * Someone complains that AI is bad because it just does a bunch of stuff for you really quickly and does it wrong. * I ask what tools they’re using, and they say they’re using Cursor because it’s

Read more

Safe Claude Code Settings

Planning, Building, Reviewing: Why The JJ Workflow Works Better Than Git

Use Git to Plan and Build With Claude

Why is Claude Code Different from Cursor if they Both Use Claude?