Implementing Autocomplete in Elixir using ETS?

Harrisonl · September 15, 2017, 1:18am

Basically I’m trying to implement a very minimal autocomplete using channels and ets. Not to keen on pulling in Elasticsearch as the amount of records is about 1000. Not exactly sure how to go about this.

Some ideas I had were:

Insert them using sorted charlists e.g. (“Elixir” would be inserted as :ets.insert(:cache, {"key", 'eiilrx', "Elixir}) and then doing an ets match spec -> :ets.fun2ms(fn({_, ['eii' | _], v}) -> v end. The only problem here is that it doesn’t work because ['eii' | _] tries to pattern match with [[101,105,105] | _] which obviously won’t match [101,105,105] because of the nested list. So my question with this option is, is there a way to pattern match a charlist like I am trying above without it being nested?
Create keys using word partials. E.g. if I had elixir and phoenix stored:

iex> :ets.insert(:test, {'e', ["elixir", "phoenix"]})
iex> :ets.insert(:test, {'ei', ["elixir"]})
iex> :ets.insert(:test, {'eil', ["elixir"]})
iex> :ets.insert(:test, {'eh', ["phoenix"]})
iex> :ets.insert(:test, {'ehi', ["phoenix"]})
etc.

So if someone typed in “eli” it would get sorted to 'eil' and then would perform a lookup on :ets.lookup(:test, 'eil') which would return ["Elixir"]

Obviously this approach is going to use much more memory, but considering the size of the dataset should be pretty negligible.

Which approach would be best? (Well currently only 2 works, but if I could somehow get 1 to work which would be best?)

WolfDan · September 15, 2017, 2:27am

Oh I came up with the same question a few months ago, I don’t think ets is the best solution to your problem, I think is much better use solutions like Tries, this are blazing light fast and easy to implement

A Trie implementation in Elixir

An example from the repo Readme:

Retrieval.prefix(trie, "a")
["above", "ape", "apple", "apply"]

Retrieval.prefix(trie, "app")
["apple", "apply"]

Retrieval.prefix(trie, "be")
["bed", "betray", "between"]

Retrieval.prefix(trie, "th")
[]

The drawback with this trie is you can’t make any typo is your searching (And based on you post you need correct users typos), but you can use this trie to implement another cool things like Levenshtein automata, here a few cool articles:

Of Levenshtein Automata implementations
Fast and Easy Levenshtein distance using a Trie This implementation is a learning path to make dictionaries for autocorrectors
Levenshtein automata can be simple and fast

Hope can help ^^

Harrisonl · September 15, 2017, 3:50am

Hey thanks that actually looks really interesting! My only concern with this is, is that it would require loading the entire the trie into memory inorder to traverse it - which might not as efficient? Unless i’m missing something here

NobbZ · September 15, 2017, 5:31am

You’d have to keep the ets in memory as well. Of course you can serialize each subtrie to disk. Let’s say one per starting letter. Or maybe you can think of some nested stuff. But you should do that only if you experience problems with keeping it in memory. Or you can tell in advance that it won’t fit.

Also matching on your prefix should be possible like this prefix = 'eii'; match([prefix|_]).

Harrisonl · September 15, 2017, 6:34am

oops yeah I meant ets is in memory as well, but my thought was (I could be wrong here) was that with the trie you would have to bring the WHOLE object into memory before performing some operation on it, whereas using ets, you would only be copying the one object from memory.

Also are you able to provide an example for that match because I can’t seem to get it to work:

iex(1)> :ets.new(:test, [:named_table])
:test
iex(2)> :ets.insert(:test, {"key", 'alv', "val"})
true
iex(3)> prefix = 'al'
'al'
iex(4)> ms = :ets.fun2ms(fn({_, [^prefix | _], v}) -> v end)
[{{:_, [:"$1" | :_], :"$2"}, [{:"=:=", {:const, 'al'}, :"$1"}], [:"$2"]}]
iex(5)> :ets.match(:test, ms)
[]

NobbZ · September 15, 2017, 6:39am

Fun2ms does rely on a parse transform, therefore I’m not sure if can be used from elixir at all.

Harrisonl · September 15, 2017, 7:16am

Oh right, is there way to do it without :ets.fun2ms? OR are you saying this can’t be done at al. basically I just need a way to turn this ['abc' | _] which is really [[97,98,99] | _] into [97,98,99 | _ ]

sasajuric · September 15, 2017, 8:32am

With such a small set, perhaps it would make more sense to ship all the entries to the browser and do autocomplete there, so you could avoid constant round trips?

It can. See here for an example.

NobbZ · September 15, 2017, 10:31am

Which browser? I can’t find anything in the context of the question that would imply the existence of one.

Just to play around with the possibilty, can you also provide an example where the actual value to match on is dynamic as in the question?

Disclaimer: I’m still asking and trying to solve this out of curiosity. There are much more efficient ways to solve this, as the already mentioned Trie (where we find all matches in a time that is proportianal to the given length of prefix). In an ETS for the same result we can only get all matches if we scan the complete table, everytime, therefore response time is proportianal to the size of the table.

sasajuric · September 15, 2017, 11:39am

The OP mentioned channels, and also mentioned that a user is typing. Based on that I presumed that UI likely done in the browser.

Not exactly sure what you mean. Something like this?

defmodule Spec do
  @compile {:parse_transform, :ms_transform}

  def keys_for_value(desired_value), do:
    :ets.fun2ms(fn({key, value} = el) when value == desired_value -> key end)
end

table = :ets.new(:my_tab, [])

Enum.each(1..10, &:ets.insert(table, {&1, Integer.mod(&1, 3)}))

:ets.select(table, Spec.keys_for_value(1))
# [1, 7, 10, 4]

peerreynders · September 15, 2017, 11:48am

The minimalist solution is HTML5 datalist and call it a day - if you aren’t JavaScript averse take your pick - for example Pixabay autoComplete can search local data provided in an array.

NobbZ · September 15, 2017, 11:54am

Oh, I missed that one, then you are right, doing it in the client might be best, but sometimes one wants to keep the list secretely.

I’ve not asked for implementations, I’ve asked for a clue that there is a browser involved at all.

Harrisonl · September 15, 2017, 1:09pm

Yeah I was thinking about that, but the size of the actual data may be quite a bit larger then that (not 100% sure at the moment) but what I’ve come up with currently is this:

I’ve modified the Trie example from @wolfdan above and added in the ability to store an ID at the end and it seems to be working fairly well (~20(2000 items) -> 10,000(250,000 items) microseconds) depending on the number objects in there and the number of results for the search.

So basically the the trie has a structure of {term, id}.

For a first cut, it seems like it works - once I have a real idea for numbers etc. I’ll probably make a decision on which direction I need to go, so if it doesn’t look like there is going to be a heap of objects then i will most likely just send it to the browser.

NobbZ · September 15, 2017, 2:25pm

Wait? You want to tell me that finding completions to a given prefix rises with the number of entries? Either you have implemented it wrong, or you are not using the prefixes as keys but something else and just do a linear scan.

If you use the prefix as key, the number of elements should not matter but only the size of the key.

If you just do a linear scan, you can simply use a list…

OvermindDL1 · September 15, 2017, 3:59pm

Hah, I have such an efficient prefix lookup in my ExSpirit project. ^.^

In that you could do the example as (from the docs):

  iex> import ExSpirit.Tests.Parser
  iex> alias ExSpirit.TreeMap, as: TreeMap
  iex> symbol_TreeMap = TreeMap.new() |> TreeMap.add_text("int", &uint(&1)) |> TreeMap.add_text("char", &char(&1))
  iex> context = parse("int42", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 42, ""}
  iex> context = parse("charT", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, ?T, ""}
  iex> context = parse("in", symbols(symbol_TreeMap))
  iex> {String.starts_with?(context.error.message, "Tried matching out symbols and got to `i` but failed"), context.result, context.rest}
  {true, nil, "in"}
  iex> context = parse("", symbols(symbol_TreeMap))
  iex> {String.starts_with?(context.error.message, "Tried matching out symbols and got the end of the line but failed to find a value"), context.result, context.rest}
  {true, nil, ""}

  iex> import ExSpirit.Tests.Parser
  iex> alias ExSpirit.TreeMap, as: TreeMap
  iex> symbol_TreeMap = TreeMap.new() |> TreeMap.add_text("let", 1) |> TreeMap.add_text("letmap", 2) |> TreeMap.add_text("", 0)
  iex> context = parse("", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 0, ""}
  iex> context = parse("A", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 0, "A"}
  iex> context = parse("let", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 1, ""}
  iex> context = parse("letmap", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 2, ""}
  iex> context = parse("letma", symbols(symbol_TreeMap))
  iex> {context.error, context.result, context.rest}
  {nil, 1, "ma"}

The symbol treemap is even safe by design to encode into the beam file itself by unquoting it into position to make it even faster if it is compile-time static!

You could of course just use my TreeMap straight too, it is not tied to the parser parts at all, but they do make it easy to parse out things safely and easily.

You can always encode them into a database as well.

[unquote_splicing('abc') | _] if it is known at compile-time at least, otherwise there are other workarounds that are situation dependent if an entire example is given. And manually building matchspecs is pretty easy too. :ets.fun2ms is entirely unneeded.[quote=“NobbZ, post:14, topic:8651”]
Wait? You want to tell me that finding completions to a given prefix rises with the number of entries? Either you have implemented it wrong, or you are not using the prefixes as keys but something else and just do a linear scan.
[/quote]

Yeah precisely. If you want an actual trie then try out my ExSpirit’s TreeMap as shown above, it is quite well tested and used.

WolfDan · September 15, 2017, 10:08pm

I’ll definitely take a look at your library when I have time!