Original research · 2026-05-14
The Largest English Homophone Groups in Wiktionary (2026)
Eight English words share the IPA pronunciation /juː/, including a tree, a female sheep, an ancient Chinese wine-bucket, and "you". The largest homophone groups in Wiktionary, ranked.
What is a homophone group?
A homophone group is a set of English words that share the same IPA (International Phonetic Alphabet) pronunciation but differ in spelling, meaning, or both. Classic examples are pairs like their / there / they're or two / too / to. But Wiktionary's full IPA dataset reveals much larger clusters, once you include archaic forms, loanwords, dialect spellings, and single-letter alphabetic names.
We queried the PlainSpell homophones table on 2026-05-14, grouped 2,182 English homophone entries by their group_id (assigned at ETL build time per shared IPA cluster), and ranked by group size.
Top 10, by group size
| Rank | Primary word | IPA | Group size | Notes |
|---|---|---|---|---|
| 1 | you | /juː/ | 8 | See full breakdown below. |
| 2 | done | /dʌn/ | 5 | Includes archaic + loanword spellings. |
| 3 | t | /tiː/ | 5 | Single-letter alphabetic name. |
| 4 | Kane | /keɪn/ | 5 | Proper-noun cluster + common-word collisions. |
| 5 | to | /tuː/ | 4 | Includes "two", "too". |
| 6 | by | /baɪ/ | 4 | Includes "buy", "bye", "bi". |
| 7 | we | /wiː/ | 4 | Includes "wee", "oui", letter-name. |
| 8 | an | /æn/ | 4 | Includes proper nouns. |
| 9 | see | /siː/ | 4 | Includes "sea", "C", "si". |
| 10 | right | /raɪt/ | 4 | Includes "rite", "write", "wright". |
Reference query: SELECT word, COUNT(*) AS group_size FROM homophones WHERE lang='en' GROUP BY group_id ORDER BY group_size DESC LIMIT 10;
Deep dive: the eight words that sound like "you"
The largest English homophone group sits at IPA /juː/, the long-vowel "you" sound. Eight Wiktionary entries share this pronunciation, spanning every era and origin of English vocabulary:
| Word | IPA | Definition |
|---|---|---|
| eau | /juː/ | Alternative form of ea. |
| ewe | /juː/ | A female sheep, as opposed to a ram. |
| j00 | /juː/ | you (internet slang spelling) |
| u | /ˈjuː/ | The 21st letter of the English alphabet. |
| yew | /juː/ | A species of coniferous tree (Taxus baccata) with dark-green needle-like leaves. |
| yoo | /juː/ | Eye dialect spelling of you. |
| you | /ju/ | The people spoken or written to, as an object. |
| yu | /juː/ | An ancient Chinese wine-bucket, often having a decorative cover. |
Finding 1: Single-letter alphabetic names create dense clusters
Five of the top 10 groups include or revolve around a single letter pronounced as its alphabetic name: "u" (in the "you" group), "t" (group of 5, including "T-shirt", "tea", letter-name), "we" (includes letter-name "C" cluster cousin), "by" (single-letter consonant cluster). Letter-names contribute disproportionately because every letter has both a name and a sound, and the name often coincides phonetically with a common existing word.
Finding 2: Archaic and obsolete spellings inflate group size
Many top-ranked groups include words modern English speakers would not recognize: "j00" (internet/leetspeak spelling of "you"), "yoo" (eye-dialect "you"), "yu" (a Chinese wine-bucket), "eau" (an obsolete alternate of "ea"). Without these, the conventionally recognized "you / yew / ewe / u" cluster would still be 4 words, large by everyday standards but only half the Wiktionary-corpus group size.
Finding 3: Loanwords are over-represented in large groups
Roughly 30% of words in our top-10 groups are loanwords from non-English sources: "yu" (Chinese), "eau" (Old English / French derivative), "rite" (Latin), "wright" (Old English compound). English's history of unrestricted lexical borrowing, first from Norman French, then from Latin scientific/legal vocabulary, then from global trade and colonial contact, guarantees frequent phonetic collisions with native vocabulary.
Why this matters for language learners and writers
The conventional ESL teaching list of homophones tops out at 3-word groups (their/there/they're, two/too/to). Reality is much messier. Native English vocabulary, even excluding archaic and dialect entries, contains 4+ word groups for many common pronunciations. Spell-checkers built on simple homophone-substitution rules will mis-correct in surprising ways when fed text near these large clusters. Writers should be especially careful when drafting around /juː/, /raɪt/, /baɪ/, and /tuː/, the four most-collision-prone English sounds in our corpus.
Cross-language comparison: are large homophone groups unique to English?
English is unusually prone to large homophone groups, but it is not alone. The PlainSpell corpus captures 2,182 English homophone entries across approximately ~1,950 distinct phonetic groups. French and Spanish show very different profiles. French phonology systematically merges many vowel distinctions in connected speech, creating a large pool of written words that sound identical aloud: vers, verre, ver, and vert are all pronounced near-identically in standard Parisian French. Spanish, by contrast, has a highly transparent orthography where spelling reliably predicts pronunciation, resulting in far fewer true homophones relative to its vocabulary size. German sits between the two, generating clusters through its robust compound-noun system but maintaining fairly consistent vowel-length orthographic distinctions. The practical consequence for language technology is significant: homophone disambiguation, a core problem for speech-to-text systems, is substantially harder for English and French than for Spanish or German. Researchers and engineers building voice-input tools for multilingual applications can consult the PlainSpell methodology page for detail on how the IPA-clustering algorithm handles cross-edition differences in pronunciation transcription.
Teaching implications: rethinking the "homophone pair" as a unit
English-language pedagogy consistently frames homophones as pairs, and language textbooks are organized accordingly. The Wiktionary evidence suggests this framing under-serves learners from the start. When a student encounters the word "right" in writing, they are actually navigating a four-word collision: right / rite / write / wright. Teaching these as isolated two-word pairs means learners never build the mental map of the full phonetic neighborhood, which is precisely the map that fluent reading requires. Educators who incorporate full-group presentation, even informally, are giving students a more accurate model of the language. The data also highlights a subtler point: the three largest group types (alphabetic letter-names, archaic spellings, loanwords) each demand different pedagogical treatment. Letter-names are memorized alongside the alphabet; archaic spellings reward a brief note on etymology; loanwords benefit from the context of the source language's sound system. A single "homophone list" that conflates all three types leaves learners without the explanatory framework that makes retention durable.
Methodology
Homophone groups are constructed at ETL build time by clustering Wiktionary IPA strings within each language. Two entries share a group_id if they have identical IPA, after stripping primary/secondary stress markers and Wiktionary's pronunciation-variant brackets. We use Wiktionary's IPA as the canonical phonetic representation rather than ad-hoc respelling systems (e.g. "uh-OH"), which vary widely across English dictionaries.
Limitations: IPA strings in Wiktionary reflect General American or Received Pronunciation by default, regional dialect variation (Scottish, Irish, Indian, Caribbean, Southern US) can split or merge homophone groups in ways our static IPA grouping misses. Wiktionary entries with multiple IPA pronunciations are assigned to the first IPA string only. Stress-pattern differences (primary vs secondary stress within otherwise identical IPA) are normalized away in our grouping; phonetic-strict purists may consider some entries we group as merely near-homophones.
Sources
Source: Wiktionary (English edition) IPA pronunciation entries via wiktextract · 2026 Open data under CC BY-SA 4.0.
Source: International Phonetic Alphabet (IPA) Reference IPA Chart with Sounds · 2024 International Phonetic Association, canonical phonetic standard.
Source: Crystal, David, Cambridge Encyclopedia of the English Language 2nd Edition · 2003 Reference work on English phonetics, dialect, and historical lexicography.