PlainSpell data

The corpus in numbers

According to the PlainSpell database, the corpus holds 6,918,744 words and 3,375,992 confusable pairs across 5 languages, sourced from Wiktionary (CC BY-SA, May 2026). Every figure below is counted directly from the live database and free to cite.

At a glance

PlainSpell indexes 6,918,744 words across 5 languages, with 3,375,992 confusable pairs, 27,821 homophone groups and 1,974,632 generated misspelling variants.

6.92M
words indexed
3.38M
confusable pairs
27,821
homophone groups
1.97M
misspelling variants

Source: Wiktionary (kaikki.org, CC BY-SA) + open word-frequency list. Data vintage May 2026.

Words indexed by language

Language Words Confusables Homophones
🇫🇷 French
4,485,239
440,172 21,890
🇩🇪 German
1,077,739
2,006,359 2,859
🇪🇸 Spanish
770,428
323,831 812
🇺🇸 English
545,755
529,999 2,182
🇧🇷 Portuguese
39,583
75,631 78
All languages 6,918,744 3,375,992 27,821

Counts are read live from the PlainSpell database (data vintage May 2026). Misspelling variants are generated by edit-distance from each headword, not observed corpus frequencies.

Cite these statistics

These figures are free to reuse with attribution (CC BY-SA). Copy the citation:

PlainSpell, “PlainSpell Corpus Statistics” (May 2026). Derived from Wiktionary (kaikki.org, CC BY-SA) and an open word-frequency list. https://plainspell.com/statistics

Go deeper

From the totals to the individual records.

  • Explore any language in full — definitions, IPA, etymology and misspellings. English A–Z
  • See the cross-language rankings: hardest to spell, most confusable, largest homophone groups. Rankings
  • Read how the corpus is built and refreshed. Methodology