Why was the levenshtein distance removed from Elixir?

Back in 2015 the String.levenshtein_distance/2 function was removed, and replaced with the String.jaro_distance/2 we have now.

This stackoverflow answer explains the difference between the two algorithims.

My question is, why was levenshtein distance removed, when it has certain use cases that Jaro does not satisfy. To go deeper, why are any of these string comparisons still in STDlib? I do find them immensely useful, but it does seem odd to have them as an incomplete set.

Interesting enough, both the Jaro implementation we have in STDlib right now and the older levenshtein implementation both have “alternates” that aim to improve their ability to differentiate string distance. Jaro has the winkler modification, which gives favorability to strings that share longer prefixes, and levenshtein has the damerau modification, which allows it to recognize transpositions, and the Boehmer & Rees modification, which allows for arbitrary length transpositions.

The last file in the linked PR is the key: mix uses String.jaro_distance to suggest alternatives for typo’d commands.

The linked StackOverflow answer mentions that Jaro is preferable for short strings, which the average mix task name qualifies as.

Finally, it’s in stdlib for the same reason that tsort is in the Ruby stdlib; tooling that either runs the package manager or IS the package manager needs the functionality to exist before packages can be relied on. For instance, running mix desp.get (typo) on a clean Elixir install needs to be able to reply with:

** (Mix) The task "desp.get" could not be found. Did you mean "deps.get"?
9 Likes