So I’d like to split a String input on one of the below characters:
So I have the following code snippet in iex session:
I had to substract underscore symbol from \W(any “non-word” character) as it includes it (a-zA-Z_).
As you see, there is still a problem with matching the hyphen in co-operation word.
Whatever I try, whenever I put - in the above regex, nothing works, - it just breaks the previously matching cases.
If I understand your problem, I think you need to escape - in your character class. The reason is that - is used to defined ranges of characters. For example, ~r/[a-z]/ means all characters from a to z, not a, -, and z. You can escape characters in a character class, by using \. So, for the previous example, to get it to mean a, -, and z, you’d do ~r/[a\-z]/.
In most flavors that support Unicode, \w includes many characters from other scripts. There is a lot of inconsistency about which characters are actually included. Letters and digits from alphabetic scripts and ideographs are generally included. Connector punctuation other than the underscore and numeric symbols that aren’t digits may or may not be included.