Problem with failing test

Hey, I have a problem in my project and I have created a minimal reproduce repository. Can you please check it?

$ git clone
Cloning into 'reproduce'...
remote: Counting objects: 12, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 12 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (12/12), done.
$ cd reproduce/
$ mix test
Compiling 1 file (.ex)
Generated reproduce app

  1) test fails (ReproduceTest)
     Expected truthy, got false
     code: assert Reproduce.example("TeŚcIk", "teścik")
       test/reproduce_test.exs:5: (test)

Finished in 0.03 seconds
1 test, 1 failure

Randomized with seed 471704

Maybe you can downcase both string in

def example(first, second), do: String.downcase(first) == String.downcase(second)

or change your test to

  test "fails" do
    str = "TeŚcIk"
    str_downcased = String.downcase(str)
    assert Reproduce.example(str, str_downcased)

Actually, this might be a very good example on how to use property based testing.

Here’s an example using alfert/propcheck

  describe "TeŚcIk using propcheck" do
    use PropCheck

    property "downcase lowercases" do
      forall {lowercased_str, original_str} <- bin_for_downcase() do
        Reproduce.example(original_str, lowercased_str)

    defp bin_for_downcase do
      let bin <- binary() do
        {String.downcase(bin), bin}

It doesn’t really test anything, though, since it uses the same function String.downcase both in the test and in the data generation. You would need to find some other “downcase” implementation to test against. Or generate the data differently.

1 Like

@idi527: Thanks for quick response. Unfortunately it does not solve real problem i.e. why test is failing.

Quick guess without testing: normalisation of composed glyphs.

Try String.equivalent?/2 instead, it normalizes the string first but is slower.

1 Like

Not sure why, but when I copy/paste the second “lowercased” string into my console it shows up with an extra space: "teś​ cik".

What does String.codepoints/1 give you for each string?

1 Like

I was able to take a look into the sources now and will post the relevant parts from my hex-view:

00000050: 6475 6365 2e65 7861 6d70 6c65 2822 5465  duce.example("Te
00000060: c59a 6349 6b22 2c20 2274 65c5 9be2 808b  ..cIk", "te.....
00000070: 6369 6b22 290a 2020 656e 640a 656e 640a  cik").  end.end.

This means, the mixed case String "TeŚcIk" is raw <<84, 101, 197, 154, 99, 73, 107>> and gets lowercased to (raw) <<116, 101, 197, 155, 99, 105, 107>>, while the lowercased literal ("teścik") is raw <<116, 101, 197, 155, 226, 128, 139, 99, 105, 107>>.

So both lowercased versions are in different canonical forms, though both are valid. And as you can see, they are in fact not ==:

  t    e    ś                        c   i    k
<<116, 101, 197, 155,                99, 105, 107>> 
<<116, 101, 197, 155, 226, 128, 139, 99, 105, 107>>
1 Like

It seems though as if String.equivalent?/2 doesn’t help either.

The first string gets normalized (:nfd) to <<116, 101, 115, 204, 129, 99, 105, 107>> while the second one isn’t changed.

So it seems that either the composer/decomposer in elixir is wrong, or the composer/decomposer in our font engines…

1 Like

Please tell me what name has your anti-troll console? :smile:

Really, really close, but …

nooooooo - so close! :smiley:
One of them is not valid.

Nope, both of them works.

Thank you all for trying to find a solution. @NobbZ was definitely really close to solution, but he got lost in his summary.

Firstly if you look at mix.exs file then you can find such version: 0.0.1-first-april-rc.0.

Secondly @NobbZ was right with differences between lowercase version, but he does not found why.

I have used:

It’s not visible even if you are comparing two strings in assert directly. Editors does not display it as well and its also not visible in commit diff. Perfect 1st April troll. :077:

Almost perfect, because @jfeng’s anti-troll console gave him a really important tip.


I should have opened the test files in terminal emacs :wink: There (but sadly only there) Zero-Width-Characters are displayed as underscores…

Anyway, I was considering looking at the actual codepoints, but was interrupted by my family and moved away from that idea and simply posted what I’ve got so far…