Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The complexity of these rules, and the number of exceptions that you need to learn notwithstanding the rules, can be roughly estimated for any given language by training a language model on word <-> IPA correspondence for that language (using a subset of the vocabulary as a training set), and then seeing how well it can predict the remaining words. You can run it in either direction, too, to separately measure the difficulty of reading (word -> IPA) and writing (IPA -> word) that language.

This was actually done for a number of languages including English:

https://arxiv.org/abs/1912.13321

You can see how languages with true phonemic spellings tend to be in the >90% range on both reading and writing, with Esperanto at 99%. Spanish and German are in 60-80% range. English is dismal at ~30% for both, though, with only French and Chinese being harder to write, and all other languages tested being easier to read.



Nice!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: