Original research · 2026-06-02

The Most Confusable English Word Pairs (2026)

Name: The Most Confusable English Word Pairs (2026)
Creator: PlainSpell
Published: 2026-06-02
License: https://creativecommons.org/licenses/by-sa/3.0/

Which English word pairs are most easily confused? The ranking below reflects the current dataset, ordered by rank. The top pairs are not obscure vocabulary, they are short, high-frequency function words where a single character substitution produces a completely different grammatical unit.

Compiled byPlainSpell Editorial, Language Reference Editorial Team · June 2, 2026

Spotted an error on this page? Report it and we'll review it.

What makes a word pair "confusable"?

In the PlainSpell corpus, two words are classified as a confusable pair when they share a small edit distance (typically 1-2 character operations: substitution, insertion, or deletion), belong to the same language, and are both real dictionary entries in Wiktionary. The internal confusability score reflects the algorithmic proximity measure computed for each pair. Rank 1 indicates the pair that scores highest on this measure across the English vocabulary. Importantly, the confusability score is an internal corpus metric, not a direct measure of how often human writers confuse the two words in practice, which would require a separate empirical corpus study.

PlainSpell's English dataset contains 529,999 pairs, every algorithmically identified confusable pair across the full PlainSpell English vocabulary. The rankings below surface only the 15 pairs that score highest by rank. Understanding why these pairs cluster at the top reveals something fundamental about English orthography and cognitive load.

Top 15, ranked by confusability (rank 1 = most confusable)

that vs this rank #1

that vs they rank #2

they vs this rank #3

will vs with rank #4

their vs this rank #5

which vs with rank #6

that vs them rank #7

them vs this rank #8

than vs that rank #9

10.

their vs they rank #10

11.

than vs this rank #11

12.

there vs they rank #12

13.

what vs when rank #13

14.

that vs then rank #14

15.

think vs this rank #15

Pattern 1: High frequency amplifies small edit distances

The pairs at the top of the confusability ranking share a consistent characteristic: both words in the pair are extremely common in everyday English. Function words like "that", "this", "they", "the", "with", "their", and "which" appear hundreds of times per thousand words of running text. The edit distance between many of these pairs is just one or two character operations, yet a single-character substitution transforms one grammatical function entirely. Swapping "that" for "this" changes a distal demonstrative to a proximal one; swapping "they" for "the" removes agency from a sentence. The cognitive cost is not in the character count but in the grammatical pivot. Writers who type fast and correct rarely catch these substitutions precisely because both alternatives are valid English words that pass any spell-check.

Pattern 2: Short words have proportionally large error neighborhoods

A 4-letter word has a much larger edit-distance-1 neighborhood relative to its length than a 16-letter word. For a word like "that" (4 characters), edit-distance-1 substitutions alone generate 3 × 25 = 75 candidate strings, many of which are valid English words. For a 16-character word, the same operation generates many more strings, but a far smaller fraction of them coincide with real words, so most misspellings of long words produce obvious non-words that spell-checkers catch immediately. Short, high-frequency function words sit in a dense real-word neighborhood where nearly every substitution produces another real word. That density is precisely what the confusability ranking measures, and it explains why the top pairs are uniformly short.

Pattern 3: Grammatical interchangeability compounds the confusion

Many top-ranked pairs are not just phonetically or orthographically similar, they are also grammatically substitutable in certain sentence contexts. "With" and "wish" belong to different grammatical categories (preposition vs. verb), yet they can both appear in contexts like "I _ you well" where the surrounding syntax does not immediately flag the error. "Their" and "there" are both valid sentence-final positions and both unstressed in speech. "Which" and "witch" differ by a single initial consonant but only one is a function word. Grammatical interchangeability is a multiplier on the basic edit-distance confusability score: a pair that is both orthographically close AND grammatically substitutable represents a higher practical writing risk than an equally close pair where one word could never occupy the same syntactic slot as the other. Spell-checkers that perform contextual grammar checks (like Grammarly's context-aware suggestion engine) handle this dimension better than pure edit-distance engines, but even the best contextual checkers miss many instances in complex compound sentences.

Implications for writer's tools and plain-language editing

The existence of a dense confusable-pair neighborhood around the most common English function words has direct implications for anyone building writer-assistance tools or working in plain-language editing. Any tool that ranks spell-check suggestions purely by edit distance will systematically surface the wrong candidate for exactly these high-frequency pairs, because the closest candidate by distance is often a common word in the same grammatical class. A better heuristic weights suggestions by the conditional probability of the intended word given the surrounding context, a Bayesian approach that language models implement naturally. Plain-language editors working on legal, medical, or government documents should pay particular attention to this class of near-miss error: the stakes of confusing "with" and "wish", or "their" and "there", in a binding contract or clinical note are disproportionate to the tiny edit distance involved.

The full confusables methodology, including how PlainSpell handles multi-word confusables, homophone overlap with confusable pairs, and cross-language confusability, is documented on the PlainSpell methodology page.

Methodology

Confusable pairs are identified by computing the edit distance between every pair of English vocabulary entries, using a Levenshtein-based approach that accounts for character transpositions (Damerau-Levenshtein distance). Pairs with a distance at or below the confusability threshold for their combined length are included in this ranking, ordered by their internal confusability score. For the full algorithm specification, see the PlainSpell Methodology page.

Limitations: The confusability score is a corpus-internal algorithmic metric, not an empirically validated measure of human error frequency. Pairs that score highly may not be equally confusing to all writers, expert writers may navigate high-ranked pairs effortlessly while struggling with lower-ranked domain-specific pairs. The ranking reflects orthographic proximity weighted by our internal scoring parameters; different weighting schemes would produce a different ordering. The score shown represents an internal metric whose exact unit is not directly comparable across different ranking types.

Sources

Source: Wiktionary (English edition) JSONL dump via wiktextract · 2026 Open data under CC BY-SA 4.0.

Source: Damerau, F.J., A Technique for Computer Detection and Correction of Spelling Errors CACM 7(3) 1964 · 1964 Original formulation of the edit-distance approach underlying confusable-pair detection.

Source: Norvig, Peter, How to Write a Spelling Corrector Practical spelling-correction reference · 2007 Canonical practical reference for probabilistic edit-distance spell-checking.