Finding maximum match (contiguously from begining) string from a list of strings for a string

I am facing a problem in iterating over a list of string and finding the one matches the most with the input string. As I see, any assignment to any variable inside the Enum.each does not stay once we are out of Enum.each.
Suppose I have a string, name = “A123”.
List of String for searching the closest, list = [“AAAA”,“A129”,“A451”,“A134”,“B321”,“A522”]
For our string “A123”, this should return me “A129” (Matches 3 characters from beginning)
If we have a string, name = “A732”, we should get the string “AAAA” (Matches 1 characters from beginning).
For a string, name = “A472”, we should get the string “A451” (Matches maximum 2 characters from beginning).
How can I implement this in Elixir?

This might give you some ideas:

defmodule Match do
  @list ["AAAA","A129","A451","A134","B321","A522"]

  def nearest(key, list \\ @list) do
    Enum.reduce list, {"", 0}, fn item, {nearest, distance} ->
      if (maybe_nearer = String.jaro_distance(key, item)) > distance do
        {item, maybe_nearer}
      else
        {nearest, distance}
      end
    end
  end

  def longest(key, list \\ @list) do
    Enum.reduce list, {"", 0}, fn item, {nearest, length} ->
      match =
        key
        |> String.myers_difference(item)
        |> Keyword.get(:eq)

      if (len = String.length(match || "")) > length do
        {item, len}
      else
        {nearest, length}
      end
    end
  end

  def leading(key, list \\ @list) do
    Enum.reduce list, {"", 0}, fn item, {nearest, distance} ->
      match_length = String.length(item) - String.length(String.trim_leading(item, key))
      if match_length > distance do
        {item, match_length}
      else
        {nearest, distance}
      end
    end
  end
end

I’m using String.jaro_distance/2 here to define “nearest” and String.myers_difference/2 to define “longest” but of course you can substitute whatever “nearest” or “longest” means to you. [Update] I added Match.leading/2 to match the longest leading prefix but its not at all efficient.

Examples

iex> Match.longest "A1"                  
{"A129", 2}
iex> Match.nearest "A1"
{"A129", 0.8333333333333334}
iex> Match.leading "A1"
{"A129", 2}
iex> Match.leading "Z" 
{"", 0}
3 Likes
 Match.longest("A234")
** (FunctionClauseError) no function clause matching in String.Unicode.length/1    
    
    The following arguments were given to String.Unicode.length/1:
    
        # 1
        nil
    
    Attempted function clauses (showing 1 out of 1):
    
        def length(string) when is_binary(string)
    
    (elixir) lib/elixir/unicode/unicode.ex:259: String.Unicode.length/1
    match.ex:38: anonymous fn/3 in Match.longest/2
    (elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3

And your question is?

1 Like

I was able to get the jaro_distance and finding the closest match. However, in that case “2111” was getting matched with “1111” instead of “2234”. The second process might help me, however, I got a run time error which I posted above. I am a novice in Elixir. Sorry :slight_smile:

Thanks, thats much more helpful. This is a really good community, but just posting exceptions isn’t likely to encourage much engagement or support.

The exception is raised because when there is no substring matching match will be nil. I have updated the code above to handle this outcome.

Its great to ask more questions, but please let us know what you’ve tried as well.

1 Like

Also I doubt String.myers_difference/2 will match the max leading substring either. I think thats something you’ll have to implement. I’m just trying to give you some approaches to take using the standard library.

1 Like

Thanks @kip
I was implementing it using a cond with String.at(0), String.at(1) and String.at(2) depending on that I would update the state of a genserver having the count of matched characters. When a new string has more number of matches, will update the state to that.

Was thinking if there was a easier way to implement this.

I’ve added a Match.leading/2 that I think does what you want - match the longest leading substring. But its not very efficient. Probably more efficient that using String.at/2 though. Its an interesting and fun problem.

Many thanks.
The Match.leading/2 do help while matching a lesser length string than what we have in the list. Will try to make this work for same length as well.

@kip thank you for the help. I just sliced the input string.

key = String.slice(key, 0…-2)

then I ran the Match.leading, can get the closes match from beginning.