Simulate Regex match guards in functions

cfenollosa · May 22, 2016, 9:18am

Hi,

I’m writing a chat bot which needs to reply to user input.

My current code is something like this:

def reply(msg) do
  cond do:
    String.match?(msg, ~r/hi/) -> "Hello user!"
    String.match?(msg, ~r/bye/) -> "Bye user!"
    String.match?(msg, ~r/name/) -> "My name is Chatbot"
    true -> "Sorry, I didn't understand you"
  end
end

You can see that this scales poorly.

I’d like to create multiple reply(msg) functions with a kind of guard that matches the regex.

def reply(msg) when String.match?(msg, ~r/hi/) do
  "Hello user!"
end

def reply(msg) when String.match?(msg, ~r/bye/) do
  "Bye user!"
end

...

I know this can’t be done since guard functions are limited. In Python I can do this by decorating the function. I assume that something could be done in Elixir with Macros, but I don’t know which is the most elixir-y way to solve this.

Right now, it works with the giant cond but as bot actions turn more complex I could end up with a function with 100s of lines, that’s why I’d want to split it.

Thanks!

rubytree · May 22, 2016, 11:59am

First thing that comes to my mind is to prepare list of keys that will identify ‘type’ of message agains given regex.
Something like:

defmodule Example do
  @message_types [{~r/hi/, :hi}, {~r/bye/, :bye}, {~r/name/, :name}]

  def reply(msg) do
    msg |> parse |> do_reply
  end

  defp parse(msg) do
    {_, type} = Enum.find(@message_types, {nil, :unknown}, fn {reg, type} -> 
       String.match?(msg, reg)
     end)
    {type, msg}
  end

  defp do_reply({:hi, msg}), do: "Hello user!"
  defp do_reply({:bye, msg}), do: "Bye user!"
  defp do_reply({:name, msg}), do: "My name is Chatbot"
  defp do_reply({:unknown, msg}), do: "Sorry, I didn't understand you"
end

Of course in this trivial example we don’t need to pass msg but I assumed you would like do something more with msg in reply function

cfenollosa · May 22, 2016, 12:53pm

I love this solution! It’s simple and the overhead is minimal, just another function.

I only see one drawback to it. It requires to store all message types twice: both in @message_types and in each do_reply(). It’s not a big deal and I guess maybe the language will grow to support regexes as guards

bbense · May 22, 2016, 3:09pm

One thing you could to do eliminate the duplication is to store the functions for each action in a map with the tag as key.

Something like

%{ :hi => fn msg → “Hello user” end }

You’d search over the keys of the map and then apply the function corresponding to the key. You
could just use the strings as the map key rather than special atoms. Kind of depends how complicated you want the parsing to be.

If you wanted to get super fancy, you could use the regexp as the map key.

%{ ~r/hi/ => fn _msg → “Hello user” end }

cfenollosa · May 22, 2016, 4:02pm

Great answer. I think it can be even more succinct by storing the regexes as the key, and the function name as the value

%{ ~r/hi/ => fn_hi,
   ~r/bye/ => fn_bye,
   ...}

However, I’m having problems with the syntax. How can I store a reference to a function and call it later? Here’s my sample code:

defmodule Test do
  @mess [
    {~r/hi/, fn_hi},
    {~r/bye/, fn_bye}
  ]

  def reply(msg) do
    {_, func} = Enum.find(@mess,
                          {nil, nil},
                          fn {reg, func} -> String.match?(msg, reg) end
                         )
    func(msg)
  end

  defp fn_hi(msg), do: "Hello!"
  defp fn_bye(msg), do: "Bye!"

end

But the compiler complains:

$ iex -r test.ex
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

** (CompileError) test.ex:3: undefined function fn_hi/0
    (elixir) expanding macro: Kernel.@/1
    test.ex:2: Test (module)
    (elixir) lib/code.ex:363: Code.require_file/2

Qqwy · May 22, 2016, 4:10pm

Compilation happens per-file. You cannot call functions that are made inside this module at compile-time.

And yes, by specifying them like that, you attempt to call them inside the map definition (i.e. Elixir thinks you want to store the result of fn_hi in the map)

What you can do instead, is to refer to the functions as atoms, and then use Kernel. apply/3 to call them when the regexp matches.

cfenollosa · May 22, 2016, 4:22pm

Thanks! Now it looks great, succinct and very maintainable, as there is no need to keep the atoms synced between message_types and the functions.

Here’s the final code with the correct syntax, using Kernel.apply/3 as suggested by @Qqwy

defmodule Test do
  @message_types [
    {~r/hi/, :fn_hi},
    {~r/bye/, :fn_bye}
  ]

  def reply(msg) do
    {_, func} = Enum.find(@message_types,
                          {nil, :fn_unknown},
                          fn {reg, _} -> String.match?(msg, reg) end
                         )
    Kernel.apply(Test, func, [msg])
  end

  def fn_hi(msg), do: "Hello! you said " <> msg
  def fn_bye(msg), do: "Bye! you said " <> msg
  def fn_unknown(msg), do: "I didn't understand you. You said: " <> msg

end

To test it:

$ iex -r test.ex 
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Interactive Elixir (1.2.5) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Test.reply("hi")
"Hello! you said hi"
iex(2)> Test.reply("ok bye!")
"Bye! you said ok bye!"
iex(3)> Test.reply("other")  
"I didn't understand you. You said: other"

rubytree · May 22, 2016, 5:56pm

IMO This solution is harder to reason about, and I thought, over time, your code will be more complex, i.e. @message_types and Enum.find may become function or a module that will use more complicated way of ‘tagging’ messages like advanced regexes, passing tagged message would be easier because you would need to change only reply function. Now, you would need to change reply and all your function names because you don’t want to couple 2 separate modules.

Example:

defmodule UberParser do
    def parse(msg) do
    # whatever an implementation is, it should return tuple
    {type, msg}
  end
end

defmodule Example do
 
  def reply(msg) do
    msg |> UberParser.parse |> do_reply
  end

  defp do_reply({:hi, msg}), do: "Hello user!"
  defp do_reply({:bye, msg}), do: "Bye user!"
  defp do_reply({:name, msg}), do: "My name is Chatbot"
  defp do_reply({_, msg}), do: "Sorry, I didn't understand you"
end

rvirding · May 23, 2016, 12:11pm

This is the only way if you want it really fast, especially with a large set of regexes. The other methods all entail trying each regex sequentially. An existing tool for this is leex which compiles a set regexes into an efficient DFA. Two problems though: the regexes are by necessity more limited; and the definition file is in erlang.

You could easily write a tool which generates the leex definition file from a set of regexes.

Robert

bbense · May 23, 2016, 6:01pm

That’s a very interesting idea, there are lot’s of “benchmarks” that pretty much consist applying regexps to some set of strings. Do you have any idea where the crossover point is for a leex DFA verses mapping across a list of Regexps?

cfenollosa · May 23, 2016, 6:05pm

Excuse me, which one is the fastest way? I can’t see which post you were referring to. Thanks!

bbense · May 23, 2016, 8:14pm

I apologize in advance if I’m putting words in someone else’s mouth, but if I understand rvirding’s post correctly, he’s talking about making the next step in building a parser.

A standard way that you end up writing a compiler is that you build a “toy” language for your command processor in your program. You start out by using regexp to map commands to functions and eventually that gets really complex and slow and you turn to tools like leex to
build a parser that turns input string into tokens.

This approach is kind of intermediate step along the way, you’re not defining a complete language, but you’re using the parsing tool to get a faster “tokenizing” of your input strings. Since leex uses a subset of regexp, it can do this in a fairly straightforward way.

rvirding · May 24, 2016, 9:46am

Yes, compiling the regexps with a tool like leex[*] will generally produce a much faster program for doing this type of thing because:

It will only make one pass over the string you are testing irrespective of how many regexps you are testing it against. The other alternatives here will test your string against each possible regexp one at a time, this irrespective of whether they are hard-wired in a cond or defined in some nicer way.
That the leex version can do this partially depends on the regexps it allows are more restricted, amongst other things they never need backtracking which Perl and PCRE regexps may need.
For this type of usage we don’t have to actually generate a “token” as such just some tag indicating what we found.

It is honestly quite easy to write a tool which generates an input file for leex from a set of regexps and return values. It is compile time but it is not that difficult to fix it so you could handle changing the regexp set dynamically and recompile your “scanner”, though you wouldn’t want to do it too often.[**]

Robert

[*] Leex is based on the same principles as other scanner generating tools like lex and flex. Where did we get the name from? It leaks tokens.

[**] There are programs which handle configuration data in this way, they dynamically compile a new config module containing the config data instead of keeping it in a database. Quite cool actually.

Onor.io · May 24, 2016, 2:19pm

You always bring up such cool stuff Robert! Thanks!

rvirding · May 24, 2016, 3:58pm

Who said you can’t have fun while being serious or doing serious stuff?