While working on an Elixir parser (of Handlebars syntax), I came across the need to determine whether or not a given character (a grapheme). I know that there are the various String functions (like String.trim_leading), but I’m hoping to be able to write a guard clause on a function along the lines of this:
defp custom_trim(<<h :: binary - size(1)>> <> tail) when h not in [" ", "\t", "\n"], do: # ...
But I don’t know how to specify all the other whitespace characters. Is there a function that I could leverage to tell me whether a given character represents whitespace? If I use a regular expression pattern like ~r/\s/ would that match all whitespace? If it did I could do something like this:
Thank you for the info. If I wanted to reference one of the codepoints listed in the Wiki document, e.g. U+00A0 for a “No-break space”, how would I do that in Elixir? From https://stackoverflow.com/questions/54731429/convert-a-single-character-string-to-its-codepoint I can see how to do that using pattern matching or the String.to_charlist function, but if you give me a list of codepoint numbers (as listed in the Wiki reference), I can’t see how to take an alpha-numerical representation (or an integer) and convert it back to a string.
I can solve my immediate problem but I’m still not understanding the bigger picture here, so any clarifications would be appreciated!
I have a lib that defines some guards to help with this sort of thing but specifically for whitespace you can:
when codepoint == 32 or codepoint == 160 or codepoint == 5760 or codepoint in 8192..8202 or codepoint == 8239 or codepoint == 8287 or codepoint == 12288
Or in a regex you can match on Unicode character categories: