Up next is unicode_unihan which is a new library that introspects the Unihan database.
If you thought that Latin-1 with a 159 code points is tricky enough with majuscule and minuscule and diacritics - wait til you see Unihan.
Unihan is 98,000 code points (and counting) it’s built upon shared history, culture and politics amongst China, Japan, Korea and Vietnam. Han refers to the Chinese people; Unihan refers to the monumental task undertaken to try to produce a single set of code points that respects these historical, cultural and political contexts.
The resulting Unihan Database encapsulates a wealth of information. Which can now be introspected with a new unicode_unihan library.
Motivated and inspired by @jkwchui and the work he is doing at https://visual-fonts.com this library is a collaborative work in progress. Its early days but for anyone interested you’re very welcome to provide suggestions and feedback. Here’s a simple example:
iex> Unicode.Unihan.unihan("人")
%{
kUnihanCore2020: "GHJKMPT",
kCNS1992: "1-4429",
kFourCornerCode: ["8000.0"],
kIRGHanyuDaZidian: ["10101.100"],
kFrequency: "1",
kGradeLevel: "1",
codepoint: 20154,
kCowles: ["5115.5", "5117"],
kCNS1986: "1-4429",
kHanyuPinlu: ["rén(16866)", "ren(280)"],
kBigFive: "A448",
kHanyuPinyin: ["10101.100:rén"],
kTotalStrokes: %{"zh-Hans": 2, "zh-Hant": 2},
kIICore: ["AGTJHKMP"],
kSBGY: ["102.23"],
kCihaiT: ["80.201"],
kKPS0: ["FCC5"],
kTGHZ2013: ["313.110:rén"],
kIRGDaiKanwaZiten: ["00344"],
kTaiwanTelegraph: ["0086"],
kMorohashi: ["00344"],
kDaeJaweon: "0190.010",
kTGH: ["2013:10"],
kKSC0: ["7649"],
kMainlandTelegraph: ["0086"],
kTang: ["*njin", "njin"],
kHangul: [%{grapheme: "인", source: "0E"}],
kIRG_TSource: "T1-4429",
kJapaneseKun: ["HITO"],
kIRGKangXi: ["0091.010"],
kIRG_JSource: "J0-3F4D",
kHanYu: ["10101.100"],
kXerox: ["241:051"],
kRSAdobe_Japan1_6: ["C+2579+9.2.0"],
kIRG_KSource: "K0-6C51",
kCangjie: "O",
kFenn: ["429A"],
kCantonese: ["jan4"],
kVietnamese: ["nhân"],
kLau: ["3328"],
kGB1: "4043",
kIRG_KPSource: "KP0-FCC5",
kKoreanEducationHanja: ["2007"],
kKorean: ["IN"],
kJapaneseOn: ["JIN", "NIN"],
kRSUnicode: ["9.0"],
kGB0: "4043",
kFennIndex: ["226.01"],
kXHC1983: ["0959.010:rén"],
kMatthews: [...],
...
}