I’m writing a chat bot which needs to reply to user input.
My current code is something like this:
def reply(msg) do
cond do:
String.match?(msg, ~r/hi/) -> "Hello user!"
String.match?(msg, ~r/bye/) -> "Bye user!"
String.match?(msg, ~r/name/) -> "My name is Chatbot"
true -> "Sorry, I didn't understand you"
end
end
You can see that this scales poorly.
I’d like to create multiple reply(msg) functions with a kind of guard that matches the regex.
def reply(msg) when String.match?(msg, ~r/hi/) do
"Hello user!"
end
def reply(msg) when String.match?(msg, ~r/bye/) do
"Bye user!"
end
...
I know this can’t be done since guard functions are limited. In Python I can do this by decorating the function. I assume that something could be done in Elixir with Macros, but I don’t know which is the most elixir-y way to solve this.
Right now, it works with the giant cond but as bot actions turn more complex I could end up with a function with 100s of lines, that’s why I’d want to split it.
I love this solution! It’s simple and the overhead is minimal, just another function.
I only see one drawback to it. It requires to store all message types twice: both in @message_types and in each do_reply(). It’s not a big deal and I guess maybe the language will grow to support regexes as guards
One thing you could to do eliminate the duplication is to store the functions for each action in a map with the tag as key.
Something like
%{ :hi => fn msg → “Hello user” end }
You’d search over the keys of the map and then apply the function corresponding to the key. You
could just use the strings as the map key rather than special atoms. Kind of depends how complicated you want the parsing to be.
If you wanted to get super fancy, you could use the regexp as the map key.
Compilation happens per-file. You cannot call functions that are made inside this module at compile-time.
And yes, by specifying them like that, you attempt to call them inside the map definition (i.e. Elixir thinks you want to store the result of fn_hi in the map)
What you can do instead, is to refer to the functions as atoms, and then use Kernel. apply/3 to call them when the regexp matches.
Thanks! Now it looks great, succinct and very maintainable, as there is no need to keep the atoms synced between message_types and the functions.
Here’s the final code with the correct syntax, using Kernel.apply/3 as suggested by @Qqwy
defmodule Test do
@message_types [
{~r/hi/, :fn_hi},
{~r/bye/, :fn_bye}
]
def reply(msg) do
{_, func} = Enum.find(@message_types,
{nil, :fn_unknown},
fn {reg, _} -> String.match?(msg, reg) end
)
Kernel.apply(Test, func, [msg])
end
def fn_hi(msg), do: "Hello! you said " <> msg
def fn_bye(msg), do: "Bye! you said " <> msg
def fn_unknown(msg), do: "I didn't understand you. You said: " <> msg
end
To test it:
$ iex -r test.ex
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Interactive Elixir (1.2.5) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Test.reply("hi")
"Hello! you said hi"
iex(2)> Test.reply("ok bye!")
"Bye! you said ok bye!"
iex(3)> Test.reply("other")
"I didn't understand you. You said: other"
IMO This solution is harder to reason about, and I thought, over time, your code will be more complex, i.e. @message_types and Enum.find may become function or a module that will use more complicated way of ‘tagging’ messages like advanced regexes, passing tagged message would be easier because you would need to change only reply function. Now, you would need to change reply and all your function names because you don’t want to couple 2 separate modules.
Example:
defmodule UberParser do
def parse(msg) do
# whatever an implementation is, it should return tuple
{type, msg}
end
end
defmodule Example do
def reply(msg) do
msg |> UberParser.parse |> do_reply
end
defp do_reply({:hi, msg}), do: "Hello user!"
defp do_reply({:bye, msg}), do: "Bye user!"
defp do_reply({:name, msg}), do: "My name is Chatbot"
defp do_reply({_, msg}), do: "Sorry, I didn't understand you"
end
This is the only way if you want it really fast, especially with a large set of regexes. The other methods all entail trying each regex sequentially. An existing tool for this is leex which compiles a set regexes into an efficient DFA. Two problems though: the regexes are by necessity more limited; and the definition file is in erlang.
You could easily write a tool which generates the leex definition file from a set of regexes.
That’s a very interesting idea, there are lot’s of “benchmarks” that pretty much consist applying regexps to some set of strings. Do you have any idea where the crossover point is for a leex DFA verses mapping across a list of Regexps?
I apologize in advance if I’m putting words in someone else’s mouth, but if I understand rvirding’s post correctly, he’s talking about making the next step in building a parser.
A standard way that you end up writing a compiler is that you build a “toy” language for your command processor in your program. You start out by using regexp to map commands to functions and eventually that gets really complex and slow and you turn to tools like leex to
build a parser that turns input string into tokens.
This approach is kind of intermediate step along the way, you’re not defining a complete language, but you’re using the parsing tool to get a faster “tokenizing” of your input strings. Since leex uses a subset of regexp, it can do this in a fairly straightforward way.
Yes, compiling the regexps with a tool like leex[*] will generally produce a much faster program for doing this type of thing because:
It will only make one pass over the string you are testing irrespective of how many regexps you are testing it against. The other alternatives here will test your string against each possible regexp one at a time, this irrespective of whether they are hard-wired in a cond or defined in some nicer way.
That the leex version can do this partially depends on the regexps it allows are more restricted, amongst other things they never need backtracking which Perl and PCRE regexps may need.
For this type of usage we don’t have to actually generate a “token” as such just some tag indicating what we found.
It is honestly quite easy to write a tool which generates an input file for leex from a set of regexps and return values. It is compile time but it is not that difficult to fix it so you could handle changing the regexp set dynamically and recompile your “scanner”, though you wouldn’t want to do it too often.[**]
Robert
[*] Leex is based on the same principles as other scanner generating tools like lex and flex. Where did we get the name from? It leaks tokens.
[**] There are programs which handle configuration data in this way, they dynamically compile a new config module containing the config data instead of keeping it in a database. Quite cool actually.