Ex_confusables - an Unicode tr39 confusable detection and skeleton implementation

ColaCheng · March 31, 2021, 8:19am

Hi all,

I have made an unicode tr39 confusable detection and skeleton implementation, ex_confusables

This library is able to compare two strings if they are visually confusable as described in Unicode® Technical Standard #39: Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string (NFD), replacing confusable characters, and normalizing the string again.

Also did some profiling between char list and binary string implementation. The interesting part is char list is faster than binary. The result is here.

Feel free to open any issues you encounter in the Github repo and try it!

Thank you!

Exadra37 · March 31, 2021, 9:13am

Thanks for sharing this library with us

Why did you prefixed the library name with ex_?

If I am not in mistake this prefix is normally used when a library is extending/wrapping Elixir core functionality, but I may be wrong.

ColaCheng · March 31, 2021, 10:17am

I don’t know this rule. I just came out this name because it implement in Elixir. Good to know this convention! Thank you!

John-Goff · March 31, 2021, 1:34pm

Where did you get this idea from? A quick search on hex.pm shows that if this is a convention, it’s poorly followed: Packages | Hex

Exadra37 · March 31, 2021, 1:51pm

I ready it several times in this forum, but cannot find a reference to it now.

Maybe @AstonJ can help here.

Yes, you have a lot of misuse of it.

John-Goff · March 31, 2021, 2:13pm

I think my point is that a convention is only a convention if it is followed. In this case it does not appear to be, so I would say there is no naming convention that is widely used by the community.

AstonJ · April 1, 2021, 10:44pm

I am not aware of such a convention (and the Hex or Elixir core teams may be better people to ask) however I do have some ideas about naming (related to another topic). Hopefully I will get a chance to post about it at some point

Exadra37 · April 1, 2021, 11:05pm

I have not found a reference to my assumption in the forum yet, but I know that I read something about the use of “ex_” for library names and I think it was from Jose Valim itself, but probably I am mistaken in saying that was from him and on my affirmation, because I found this article:

https://medium.com/@toddresudek/hex-power-user-deb608e60935

When porting a library from another ecosystem, the tradition is to use the existing name, and prepend it with “ex_”, or less commonly, append “_ex”.

Sorry for my confusion in saying: