March 18, 2026

“Adding Typos Made Haiku’s Accuracy Go Up” by bira

Listen Later

6 minutes

We are curious if large language models behave consistently when user prompts contain typos. To explore this, we ran a small experiment injecting typos into BigCodeBench and evaluated several Claude models under increasing noise levels. As the typo rate rose to 16%, Opus’ accuracy dropped by 9%. Surprisingly, Haiku's accuracy increased by 22%.

This post examines this unexpected “typo uplift” phenomenon and explores why noise appears to help certain models.

Do Typos Make Haiku Try Harder?

We first hypothesize that Haiku's capabilities increased because harder-to-read text makes Haiku think harder. This aligns with observed results in humans that difficult fonts make students retain knowledge better, as it forces them to expend more effort. As a proxy for effort, we plotted the number of output tokens generated by both models[1]. Contrary to our hypothesis, the number of output tokens decreased by typo rate.

Typos don't make models think harder. As typo rates increase, the output lengths of Haiku and Opus go down.

The Anomaly is Haiku-Specific

We then tested if other small models have this typo uplift anomaly. We found that both Haiku 3.5 and 4.5 have this effect of increased accuracy as typos increase, while other smaller models from [...]

---

Outline:

(00:54) Do Typos Make Haiku Try Harder?

(01:34) The Anomaly is Haiku-Specific

(02:08) The Anomaly is Benchmark-Specific

(02:42) The Culprit

(04:02) Takeaways for the Eval Engineer

(04:06) Not all grading harnesses are created equal

(04:48) Scores are lower bounds

(05:15) Aligning the model to the eval

(05:43) Appendix

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

March 16th, 2026

Source:

https://www.lesswrong.com/posts/tcic5c3BJuh3PybDZ/adding-typos-made-haiku-s-accuracy-go-up-1

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

March 18, 2026

“Adding Typos Made Haiku’s Accuracy Go Up” by bira

Listen Later

6 minutes

We are curious if large language models behave consistently when user prompts contain typos. To explore this, we ran a small experiment injecting typos into BigCodeBench and evaluated several Claude models under increasing noise levels. As the typo rate rose to 16%, Opus’ accuracy dropped by 9%. Surprisingly, Haiku's accuracy increased by 22%.

This post examines this unexpected “typo uplift” phenomenon and explores why noise appears to help certain models.

Do Typos Make Haiku Try Harder?

We first hypothesize that Haiku's capabilities increased because harder-to-read text makes Haiku think harder. This aligns with observed results in humans that difficult fonts make students retain knowledge better, as it forces them to expend more effort. As a proxy for effort, we plotted the number of output tokens generated by both models[1]. Contrary to our hypothesis, the number of output tokens decreased by typo rate.

Typos don't make models think harder. As typo rates increase, the output lengths of Haiku and Opus go down.

The Anomaly is Haiku-Specific

We then tested if other small models have this typo uplift anomaly. We found that both Haiku 3.5 and 4.5 have this effect of increased accuracy as typos increase, while other smaller models from [...]

---

Outline:

(00:54) Do Typos Make Haiku Try Harder?

(01:34) The Anomaly is Haiku-Specific

(02:08) The Anomaly is Benchmark-Specific

(02:42) The Culprit

(04:02) Takeaways for the Eval Engineer

(04:06) Not all grading harnesses are created equal

(04:48) Scores are lower bounds

(05:15) Aligning the model to the eval

(05:43) Appendix

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

March 16th, 2026

Source:

https://www.lesswrong.com/posts/tcic5c3BJuh3PybDZ/adding-typos-made-haiku-s-accuracy-go-up-1

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,326 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,242 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,321 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners