FYI. word_info v0.2.0 has been published, changelogs:
some improvements on performance.
In the first released version, dictionary data is compiled directly into BEAM codes. This may cause freezing (~3s) on starting the application, which may be a problem for an application who needs to boot as fast as possible.
In v0.2.0, the data is converted into an ETS table dump at compile-time. This allows the code to be tidy and provides a fast boot.
fix a typo on returning value
PS. I want to thank all the people and their valuable discussions involved in this thread, which leads me to a better approach:
I do not actually need this right now, but I am sure that if someone has a need for such a thing, it might also be needed for other languages other than english.
That’s what I was going to say as well. I would like to be able to use such a library, but I would consider it only if it works for the different languages supported on my website.
It would be super nice to get the IPA pronunciation of words in many different languages as well. Arpabet seems to be restricted to English so that function would only work for that language I guess.
I think both pronunciation and syllables can be derived programmatically for quite a few languages (easy mode for Spanish/Italian/Korean), if no extensive resource exists on the net.
French wiktionary (and probably others) fr.wiktionary.org usually have the IPA info.
What rules do you need aside from IPA and syllables? Most Latin languages have automatic syllabic rules (not random as in English) so you wouldn’t even need a dictionary to derive them. Korean letters are grouped by syllables as well so it’s even more straightforward (한글 : 2 syllables, 조선글 : 3 syllables).
As for frequency of use, it’s more difficult for languages whose words change depending on their grammatical function, but an idea would be to use a selection of movies with good dialogues and derive the word frequencies per translation.
I read the instructions on syllables about French, unfortunately, I found myself lack the necessary knowledge to support other languages at this moment. It would be left for others who are more suitable for this job. Thank you for the head-up of potential usage of this library and I’ll keep it in mind for future updates.