Problem with failing test

Eiji · April 1, 2018, 5:51pm

Hey, I have a problem in my project and I have created a minimal reproduce repository. Can you please check it?

$ git clone git@gitlab.com:ex-open-source/reproduce.git
Cloning into 'reproduce'...
remote: Counting objects: 12, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 12 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (12/12), done.
$ cd reproduce/
$ mix test
Compiling 1 file (.ex)
Generated reproduce app


  1) test fails (ReproduceTest)
     test/reproduce_test.exs:4
     Expected truthy, got false
     code: assert Reproduce.example("TeŚcIk", "teścik")
     stacktrace:
       test/reproduce_test.exs:5: (test)



Finished in 0.03 seconds
1 test, 1 failure

Randomized with seed 471704

idi527 · April 1, 2018, 5:53pm

Maybe you can downcase both string in lib/reproduce.ex · master · ex-open-source / reproduce · GitLab?

def example(first, second), do: String.downcase(first) == String.downcase(second)

or change your test to

  test "fails" do
    str = "TeŚcIk"
    str_downcased = String.downcase(str)
    assert Reproduce.example(str, str_downcased)
  end

Actually, this might be a very good example on how to use property based testing.

Here’s an example using alfert/propcheck

  describe "TeŚcIk using propcheck" do
    use PropCheck

    property "downcase lowercases" do
      forall {lowercased_str, original_str} <- bin_for_downcase() do
        Reproduce.example(original_str, lowercased_str)
      end
    end

    defp bin_for_downcase do
      let bin <- binary() do
        {String.downcase(bin), bin}
      end
    end
  end

It doesn’t really test anything, though, since it uses the same function String.downcase both in the test and in the data generation. You would need to find some other “downcase” implementation to test against. Or generate the data differently.

Eiji · April 1, 2018, 5:54pm

@idi527: Thanks for quick response. Unfortunately it does not solve real problem i.e. why test is failing.

NobbZ · April 1, 2018, 8:56pm

Quick guess without testing: normalisation of composed glyphs.

Try String.equivalent?/2 instead, it normalizes the string first but is slower.

jfeng · April 2, 2018, 7:15am

Not sure why, but when I copy/paste the second “lowercased” string into my console it shows up with an extra space: "teś cik".

What does String.codepoints/1 give you for each string?

NobbZ · April 2, 2018, 7:36am

I was able to take a look into the sources now and will post the relevant parts from my hex-view:

00000050: 6475 6365 2e65 7861 6d70 6c65 2822 5465  duce.example("Te
00000060: c59a 6349 6b22 2c20 2274 65c5 9be2 808b  ..cIk", "te.....
00000070: 6369 6b22 290a 2020 656e 640a 656e 640a  cik").  end.end.

This means, the mixed case String "TeŚcIk" is raw <<84, 101, 197, 154, 99, 73, 107>> and gets lowercased to (raw) <<116, 101, 197, 155, 99, 105, 107>>, while the lowercased literal ("teścik") is raw <<116, 101, 197, 155, 226, 128, 139, 99, 105, 107>>.

So both lowercased versions are in different canonical forms, though both are valid. And as you can see, they are in fact not ==:

  t    e    ś                        c   i    k
<<116, 101, 197, 155,                99, 105, 107>> 
<<116, 101, 197, 155, 226, 128, 139, 99, 105, 107>>

NobbZ · April 2, 2018, 7:49am

It seems though as if String.equivalent?/2 doesn’t help either.

The first string gets normalized (:nfd) to <<116, 101, 115, 204, 129, 99, 105, 107>> while the second one isn’t changed.

So it seems that either the composer/decomposer in elixir is wrong, or the composer/decomposer in our font engines…

Eiji · April 2, 2018, 12:44pm

Please tell me what name has your anti-troll console?

Really, really close, but …

nooooooo - so close!
One of them is not valid.

Nope, both of them works.

Eiji · April 2, 2018, 12:50pm

Thank you all for trying to find a solution. @NobbZ was definitely really close to solution, but he got lost in his summary.

Firstly if you look at mix.exs file then you can find such version: 0.0.1-first-april-rc.0.

Secondly @NobbZ was right with differences between lowercase version, but he does not found why.

I have used:

It’s not visible even if you are comparing two strings in assert directly. Editors does not display it as well and its also not visible in commit diff. Perfect 1st April troll.

Almost perfect, because @jfeng’s anti-troll console gave him a really important tip.

NobbZ · April 2, 2018, 2:37pm

I should have opened the test files in terminal emacs There (but sadly only there) Zero-Width-Characters are displayed as underscores…

Anyway, I was considering looking at the actual codepoints, but was interrupted by my family and moved away from that idea and simply posted what I’ve got so far…