In this case, the negation option for a character class will do what you want - ~r([^A-Z]) will match anything that isn’tA-Z (indicated by the leading ^.
Thank you for your valuable responses and I found out that just using the pin operator can reverse the whole scenario. For example, if we have a test case
Assertion with == failed
code: assert WordCount.count("co-operative") == expected
left: %{"co" => 1, "operative" => 1}
right: %{"co-operative" => 1}
stacktrace:
test/word_count_test.exs:35: (test)```
here the word is co-operative and when I use the pin operator, it opts the "-" between the co-operative and which should be read as a single word. What is the better approach in this case?
Now the regex expression looks like
The regexs here aren’t going to work well with Unicode input, nor match digits that aren’t the indo-arabic set of 0..9. And likely have issues with signs and exponents. None of these may matter in your use case of course. But if they do, then my ex_cldr_numbers library can help. Use Cldr.Numbers.Parser.scan/2 and then filter as you wish.
Examples
# Scan a string in a locale-sensitive fashion and extract numbers
iex> Cldr.Number.Parser.scan "Hello, world ... 123 *** ^%&*()-72.5)^% %%: 123.00>"
["Hello, world ... ", 123, " *** ^%&*()", -72.5, ")^% %%: ", 123.0, ">"]
# Scan a string in a locale-sensitive fashion and extract numbers - in the "de" locale
# Note the use of the "," as the decimal separator
iex> Cldr.Number.Parser.scan "Hello, world ... 123 *** ^%&*()-72.5)^% %%: 123,00>", locale: "de"
["Hello, world ... ", 123, " *** ^%&*()", -72.5, ")^% %%: ", 123.0, ">"]
I don’t know all edge-cases, but this one should work for you. If you have more requirements or have other questions about regular expressions please create a separate topic and feel free to ping me.
Let me guess, you’re doing the Word Count exercise on exercism, right? I’m a mentor there, so I recognized this immediately.
Well, then let me enlighten you, what you’re looking for - especially for the latter special case tests - are Unicode Categories. I could of course give you the working regex but then why do the exercise at all?
Not super important but for clarity I’d like to point out that the caret character ^ is not the pin operator in this case, it is acting as a regex negation operator.