How to check if a Unicode codepoint represents a letter or an uppercase letter?

I’m working on the “Bob” exercise on the Elixir Track in Exercism.

I am testing for uppercase letter with this simple check: c in ?A..?Z, but one of the test cases requires that Unicode uppercase letters be handled.

I’ve Googled a little bit and couldn’t found anything useful. An answer on Stackoverflow suggested to use regular expression on the string. I wonder if there’s anything as simple as Unicode.is_upper? in the standard library.

Maybe you can compare if your letter is equal to String.upcase of the same letter. If yes - then you have the uppercased letter :slight_smile:

I also need to check if it’s a “letter” at all, if it’s not a letter then I don’t check if it’s upper case.

Perhaps check if Unicode.category/1 returns :Lu.

EDIT: Just noticed that there is also Unicode.uppercase?/1 which does this exact check.

1 Like

I was about to suggest ex_unicode as well, but given the nature of this being an exercise I doubt they expect people to use an external dependency.

1 Like

Just a hint, I don’t think you need to check every codepoint individually, but instead try to work with the whole string input and just the String module.

1 Like

What should I import to use Unicode?

I have to check for 2 conditions:

  1. Contains at least one letter
  2. All letters in the string is upper case

Unicode is from ex_unicode package.

Alternatively if you want to stay with standard library, check out Regex character classes which include alpha and upper.

1 Like

Thanks.

When I first solved Bob, I did so entirely with the Regex module. One of the comments my mentor made, was that, although Bob can be solved many ways, the intent of this exercise (on the Elixir track) is to get you familiar with the String module. I was encouraged to return to the String module, and solve the exercise without using any regexes at all. The solution I came up with turned out to be much more elegant than the regex solution, and indeed required only functions from String.

Here is a hint to get you started: Under what conditions would the uppercase version of a string be the same as the original? Are there other relationships like this you can leverage?

1 Like

Guess my mentor is the same one! I finally solved the problem using only functions in the String module. It works although very inefficient.

Does your solution handle Unicode lowercase letters which doesn’t have uppercase version, and so String.upcase will return the same lowercase letter?

From Character Properties, Case Mappings & Names FAQ:

Q: Does uppercasing of a string eliminate all of the lowercase letters in it?
A: No. Some letters (notably those in the IPA block) have no matching case equivalent. As a result, uppercasing a string may not eliminate all of the lowercase letters in it.

I checked String.upcase(input) == input and String.downcase(input) != input. Slow but get all the tests passed.

That doesn’t work for all Unicode characters. IMO the exercise is badly made if solution is intended to be Unicode-unsafe.