That’s interesting. But must be a bit tricky in practise (caveat being this is not my area of expertise)
- A given language take may map to more than script:
iex> Cldr.Locale.language_data["ja"]
%{primary: %{scripts: [:Jpan], territories: [:JP]}}
iex> Cldr.Locale.language_data["zh"]
%{
primary: %{scripts: [:Hans, :Hant], territories: [:CN, :HK, :MO, :SG, :TW]},
secondary: %{scripts: [:Bopo, :Phag], territories: [:ID, :MY, :TH, :US, :VN]}
}
- A given script can map to a non-contiguous set of codepoints:
iex> Unicode.Script.scripts.hangul
[
{4352, 4607},
{12334, 12335},
{12593, 12686},
...
{65490, 65495},
{65498, 65500}
]
- Some language, like Japanese, have multiple scripts, For Japanese, hiragana, katakana and kanji (Han in Unicode speak, which has 94,215 code points!):
iex> Unicode.Script.scripts.hangul
[
{4352, 4607},
{12334, 12335},
...
{65490, 65495},
{65498, 65500}
]
iex> Unicode.Script.scripts.katakana
[
{12449, 12538},
...
{110592, 110592},
{110880, 110882},
{110948, 110951}
]
So mapping a language code to the right scripts to the right code points is not a trivial exercise. Possible for sure but hard to see this being a general purpose solution.