The net result is a bunch of new libraries designed to make it easier to work with Unicode blocks, scripts, categories, properties and sets. These are:
- ex_unicode that introspects a string or code point and tells you a lot more than you probably want to know. Buts is a good building block for other libraries.
unicode_set supports the Unicode Set syntax and provides the macro
Unicode.Set.match?/2that can be used to build clever guards to match on Unicode blocks, scripts, categories and properties.
unicode_setto provide a set of prepackaged unicode-friendly guards. Such as
- unicode_transform is a work in progress to implement the unicode transform specification and to generate transformation modules.
- unicode_string will be the last part of this series that will provide functions to split and replace strings based upon unicode sets. Work hasn’t yet started but its going to be a fun project.
Unicode sets in particular allow some cool expressions. For example:
require Unicode.Set # Is a given code point a digit? This is the # digit `1` in the Thai script iex> Unicode.Set.match?(?๓, "[[:digit:]]") true # What if we want to match on digits, but not Thai digits? # Use set difference! iex> Unicode.Set.match?(?๓, "[[:digit:]-[:thai:]]") false
Unicode.Set.match?/2 is a macro, all the work of parsing, extracting code points, doing set operations and generating the guard code is done at compile time. The resulting code runs about 3 to 8 times faster than a regex case. (although of course regex has a much larger problem domain).