LESSWRONG
LW

Ivan Dostal

Message

Computer Science student with a strong interest in AI safety and Interpretability

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings

This blogpost was created as a part of the AI Safety Fundamentals course by BlueDot Impact. All of the code can be found on my GitHub. TLDR: I tried to uncover if there are specific components in language models that enable typo correction. I identified a subword merging head in...

Feb 2, 2025•11

Ivan Dostal

Computer Science student with a strong interest in AI safety and Interpretability

Ivan Dostal — LessWrong

Ivan Dostal

Message

Computer Science student with a strong interest in AI safety and Interpretability

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings

Feb 2, 2025•11

Ivan Dostal

Computer Science student with a strong interest in AI safety and Interpretability

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings

Ivan Dostal

This blogpost was created as a part of the AI Safety Fundamentals course by BlueDot Impact. All of the code can be found on my GitHub.

TLDR: I tried to uncover if there are specific components in language models that enable typo correction. I identified a subword merging head in the first layer of Llama-3.2-1B that plays a crucial role in the process. In this blog post, I’ll walk through my thought process and findings.

Motivation

Large language models are getting significantly more capable every month, but we still don’t know how they work inside. If a model generates an incorrect answer, we currently have (almost) no way of explaining why it did so. Mechanistic... (read 1301 more words →)