I am facing a problem in iterating over a list of string and finding the one matches the most with the input string. As I see, any assignment to any variable inside the Enum.each does not stay once we are out of Enum.each.
Suppose I have a string, name = “A123”.
List of String for searching the closest, list = [“AAAA”,“A129”,“A451”,“A134”,“B321”,“A522”]
For our string “A123”, this should return me “A129” (Matches 3 characters from beginning)
If we have a string, name = “A732”, we should get the string “AAAA” (Matches 1 characters from beginning).
For a string, name = “A472”, we should get the string “A451” (Matches maximum 2 characters from beginning).
How can I implement this in Elixir?
This might give you some ideas:
defmodule Match do
@list ["AAAA","A129","A451","A134","B321","A522"]
def nearest(key, list \\ @list) do
Enum.reduce list, {"", 0}, fn item, {nearest, distance} ->
if (maybe_nearer = String.jaro_distance(key, item)) > distance do
{item, maybe_nearer}
else
{nearest, distance}
end
end
end
def longest(key, list \\ @list) do
Enum.reduce list, {"", 0}, fn item, {nearest, length} ->
match =
key
|> String.myers_difference(item)
|> Keyword.get(:eq)
if (len = String.length(match || "")) > length do
{item, len}
else
{nearest, length}
end
end
end
def leading(key, list \\ @list) do
Enum.reduce list, {"", 0}, fn item, {nearest, distance} ->
match_length = String.length(item) - String.length(String.trim_leading(item, key))
if match_length > distance do
{item, match_length}
else
{nearest, distance}
end
end
end
end
I’m using String.jaro_distance/2
here to define “nearest” and String.myers_difference/2
to define “longest” but of course you can substitute whatever “nearest” or “longest” means to you. [Update] I added Match.leading/2
to match the longest leading prefix but its not at all efficient.
Examples
iex> Match.longest "A1"
{"A129", 2}
iex> Match.nearest "A1"
{"A129", 0.8333333333333334}
iex> Match.leading "A1"
{"A129", 2}
iex> Match.leading "Z"
{"", 0}
Match.longest("A234") ** (FunctionClauseError) no function clause matching in String.Unicode.length/1 The following arguments were given to String.Unicode.length/1: # 1 nil Attempted function clauses (showing 1 out of 1): def length(string) when is_binary(string) (elixir) lib/elixir/unicode/unicode.ex:259: String.Unicode.length/1 match.ex:38: anonymous fn/3 in Match.longest/2 (elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3
And your question is?
I was able to get the jaro_distance and finding the closest match. However, in that case “2111” was getting matched with “1111” instead of “2234”. The second process might help me, however, I got a run time error which I posted above. I am a novice in Elixir. Sorry
Thanks, thats much more helpful. This is a really good community, but just posting exceptions isn’t likely to encourage much engagement or support.
The exception is raised because when there is no substring matching match
will be nil. I have updated the code above to handle this outcome.
Its great to ask more questions, but please let us know what you’ve tried as well.
Also I doubt String.myers_difference/2
will match the max leading substring either. I think thats something you’ll have to implement. I’m just trying to give you some approaches to take using the standard library.
Thanks @kip
I was implementing it using a cond with String.at(0), String.at(1) and String.at(2) depending on that I would update the state of a genserver having the count of matched characters. When a new string has more number of matches, will update the state to that.
Was thinking if there was a easier way to implement this.
I’ve added a Match.leading/2
that I think does what you want - match the longest leading substring. But its not very efficient. Probably more efficient that using String.at/2
though. Its an interesting and fun problem.
Many thanks.
The Match.leading/2 do help while matching a lesser length string than what we have in the list. Will try to make this work for same length as well.
@kip thank you for the help. I just sliced the input string.
key = String.slice(key, 0…-2)
then I ran the Match.leading, can get the closes match from beginning.