Original research · 2026-05-14

The Largest English Homophone Groups in Wiktionary (2026)

Name: The Largest English Homophone Groups in Wiktionary (2026)
Creator: PlainSpell
Published: 2026-05-14
License: https://creativecommons.org/licenses/by-sa/3.0/

Eight English words share the IPA pronunciation /juː/, including a tree, a female sheep, an ancient Chinese wine-bucket, and "you". The largest homophone groups in Wiktionary, ranked.

Compiled byPlainSpell Editorial, Language Reference Editorial Team · May 14, 2026

Spotted an error on this page? Report it and we'll review it.

What is a homophone group?

A homophone group is a set of English words that share the same IPA (International Phonetic Alphabet) pronunciation but differ in spelling, meaning, or both. Classic examples are pairs like their / there / they're or two / too / to. But Wiktionary's full IPA dataset reveals much larger clusters, once you include archaic forms, loanwords, dialect spellings, and single-letter alphabetic names.

PlainSpell's current dataset as of 2026-05-14 groups 2,182 English homophone entries by shared pronunciation cluster, and ranks by group size.

Top 10, by group size

Rank	Primary word	IPA	Group size	Notes
1	you	/juː/	8	See full breakdown below.
2	done	/dʌn/	5	Includes archaic + loanword spellings.
3	t	/tiː/	5	Single-letter alphabetic name.
4	Kane	/keɪn/	5	Proper-noun cluster + common-word collisions.
5	to	/tuː/	4	Includes "two", "too".
6	by	/baɪ/	4	Includes "buy", "bye", "bi".
7	we	/wiː/	4	Includes "wee", "oui", letter-name.
8	an	/æn/	4	Includes proper nouns.
9	see	/siː/	4	Includes "sea", "C", "si".
10	right	/raɪt/	4	Includes "rite", "write", "wright".

This table groups English homophones by shared pronunciation and ranks by group size.

Deep dive: the eight words that sound like "you"

The largest English homophone group sits at IPA /juː/, the long-vowel "you" sound. Eight Wiktionary entries share this pronunciation, spanning every era and origin of English vocabulary:

Word	IPA	Definition
eau	/juː/	Alternative form of ea.
ewe	/juː/	A female sheep, as opposed to a ram.
j00	/juː/	you (internet slang spelling)
u	/ˈjuː/	The 21st letter of the English alphabet.
yew	/juː/	A species of coniferous tree (Taxus baccata) with dark-green needle-like leaves.
yoo	/juː/	Eye dialect spelling of you.
you	/ju/	The people spoken or written to, as an object.
yu	/juː/	An ancient Chinese wine-bucket, often having a decorative cover.

Finding 1: Single-letter alphabetic names create dense clusters

Five of the top 10 groups include or revolve around a single letter pronounced as its alphabetic name: "u" (in the "you" group), "t" (group of 5, including "T-shirt", "tea", letter-name), "we" (includes letter-name "C" cluster cousin), "by" (single-letter consonant cluster). Letter-names contribute disproportionately because every letter has both a name and a sound, and the name often coincides phonetically with a common existing word.

Finding 2: Archaic and obsolete spellings inflate group size

Many top-ranked groups include words modern English speakers would not recognize: "j00" (internet/leetspeak spelling of "you"), "yoo" (eye-dialect "you"), "yu" (a Chinese wine-bucket), "eau" (an obsolete alternate of "ea"). Without these, the conventionally recognized "you / yew / ewe / u" cluster would still be 4 words, large by everyday standards but only half the Wiktionary-corpus group size.

Finding 3: Loanwords are over-represented in large groups

Roughly 30% of words in our top-10 groups are loanwords from non-English sources: "yu" (Chinese), "eau" (Old English / French derivative), "rite" (Latin), "wright" (Old English compound). English's history of unrestricted lexical borrowing, first from Norman French, then from Latin scientific/legal vocabulary, then from global trade and colonial contact, guarantees frequent phonetic collisions with native vocabulary.

Why this matters for language learners and writers

The conventional ESL teaching list of homophones tops out at 3-word groups (their/there/they're, two/too/to). Reality is much messier. Native English vocabulary, even excluding archaic and dialect entries, contains 4+ word groups for many common pronunciations. Spell-checkers built on simple homophone-substitution rules will mis-correct in surprising ways when fed text near these large clusters. Writers should be especially careful when drafting around /juː/, /raɪt/, /baɪ/, and /tuː/, the four most-collision-prone English sounds in our corpus.

Cross-language comparison: are large homophone groups unique to English?

English is unusually prone to large homophone groups, but it is not alone. The PlainSpell corpus captures 2,182 English homophone entries across 1,007 distinct phonetic groups. French and Spanish show very different profiles. French phonology systematically merges many vowel distinctions in connected speech, creating a large pool of written words that sound identical aloud: vers, verre, ver, and vert are all pronounced near-identically in standard Parisian French. Spanish, by contrast, has a highly transparent orthography where spelling reliably predicts pronunciation, resulting in far fewer true homophones relative to its vocabulary size. German sits between the two, generating clusters through its robust compound-noun system but maintaining fairly consistent vowel-length orthographic distinctions. The practical consequence for language technology is significant: homophone disambiguation, a core problem for speech-to-text systems, is substantially harder for English and French than for Spanish or German. Researchers and engineers building voice-input tools for multilingual applications can consult the PlainSpell methodology page for detail on how the IPA-clustering algorithm handles cross-edition differences in pronunciation transcription.

Teaching implications: rethinking the "homophone pair" as a unit

English-language pedagogy consistently frames homophones as pairs, and language textbooks are organized accordingly. The Wiktionary evidence suggests this framing under-serves learners from the start. When a student encounters the word "right" in writing, they are actually navigating a four-word collision: right / rite / write / wright. Teaching these as isolated two-word pairs means learners never build the mental map of the full phonetic neighborhood, which is precisely the map that fluent reading requires. Educators who incorporate full-group presentation, even informally, are giving students a more accurate model of the language. The data also highlights a subtler point: the three largest group types (alphabetic letter-names, archaic spellings, loanwords) each demand different pedagogical treatment. Letter-names are memorized alongside the alphabet; archaic spellings reward a brief note on etymology; loanwords benefit from the context of the source language's sound system. A single "homophone list" that conflates all three types leaves learners without the explanatory framework that makes retention durable.

Methodology

Homophone groups are constructed by clustering Wiktionary IPA strings within each language. Two entries are grouped together if they have identical IPA, after stripping primary/secondary stress markers and Wiktionary's pronunciation-variant brackets. We use Wiktionary's IPA as the canonical phonetic representation rather than ad-hoc respelling systems (e.g. "uh-OH"), which vary widely across English dictionaries.

Limitations: IPA strings in Wiktionary reflect General American or Received Pronunciation by default, regional dialect variation (Scottish, Irish, Indian, Caribbean, Southern US) can split or merge homophone groups in ways our static IPA grouping misses. Wiktionary entries with multiple IPA pronunciations are assigned to the first IPA string only. Stress-pattern differences (primary vs secondary stress within otherwise identical IPA) are normalized away in our grouping; phonetic-strict purists may consider some entries we group as merely near-homophones.

Sources

Source: Wiktionary (English edition) IPA pronunciation entries via wiktextract · 2026 Open data under CC BY-SA 4.0.

Source: International Phonetic Alphabet (IPA) Reference IPA Chart with Sounds · 2024 International Phonetic Association, canonical phonetic standard.

Source: Crystal, David, Cambridge Encyclopedia of the English Language 2nd Edition · 2003 Reference work on English phonetics, dialect, and historical lexicography.