Efficient way to perform vector dot product

Hi I recently have the requirement to perform dot-product on two vectors in Elixir and I was wondering what would be the most efficient way. Imagine the following scenario,

Say I have ~1M vectors stored in memory, each with the following format [{"foo", 0.123}, {"bar", 0.777}, .........]. I now have a query vector with the same format and I need to compute the dot product of the query vector and each of the ~1M stored vectors. By dot-product I mean, if both vectors contains the same string key, multiple their float value and sum it all up.

Now my most naive/straightforward solutions is something like, instead of storing them as array of tuples, I convert them into Map and store them in memory as before, so my vector would become something like a hash-table %{"foo" => 0.123, "bar" => 0.777} and then:

Enum.filter(stored_vectors, fn -> vector 
  # Since we only need to process the "overlapped keys"
  query = Map.take(query_vector, Map.keys(vector))

  Enum.map(query, fn {k, v} -> 
      v * vector[k]
  end)
  |> Enum.sum
  |> Kernel.>(0.5)
end)

Now my issue is that Map.take is extremely slow when I am querying against 1M vectors. But if I remove that code and just do multiplication anyway like this:

Enum.filter(stored_vectors, fn -> vector 

  Enum.map(query_vector, fn {k, v} -> 
      v * (vector[k] || 0)
  end)
  |> Enum.sum
  |> Kernel.>(0.5)
end)

Its even slower. I am running out of options and was wondering how can I optimise my code.

P/S: I also tried one more thing like this but to no avail:

keys1 = query_vector |> Map.keys |> MapSet.new()

Enum.filter(stored_vectors, fn -> vector 
  overlapped_keys = MapSet.intersection(keys1, vector |> Map.keys |> MapSet.new())
  Enum.map(overlapped_keys, fn k-> 
      query_vector[k] * vector[k]
  end)
  |> Enum.sum
  |> Kernel.>(0.5)
end)

Did you try ETS for storage? :ets.lookup(key) is documented to be constant for any table size.

Yes I did but apparently I cannot do such advanced lookup (dot-product) in :ets. The idea is to return a list of vectors that has a dot-product of > 0.5. I mean if ETS supports something like this then my life would be way much easier.

fun = :ets.fun2ms(fn {id, vector} when dot_product(query_vector, vector) > 0.5 -> id end)

:ets.select(:stored_vectors, fun)

Where dot_product is a routine that compute the dot-product.

If sounded like Map.take was your problem and :ets.lookup would be a faster alternative. I wasn’t suggesting that ets should do the calculation, but rather the storage.

1 Like

Hmm… I am not sure if I understand your suggestion. You mean to store all the Map keys in ETS?

Not just the keys, but the keys and their values. I’m not sure how dynamic your data is, but once data is in ets it should be retrievable quite quickly.

Well the data can be quite different from each other, I am still not sure what you meant by using ETS. Something like this?

Enum.each(stored_vectors, fn vector -> 
 :ets.insert(:stored_vectors, {"some_randomly_generated_id", vector})
end)

Then how do I retrieve it?

EDIT: Ah I see there might be some confusion, my bad. Each vector is represents an array of tuples so in my example there are 1M arrays of tuples (and each array can contains up to hundreds of tuples). Unless I use Map, then each array of tuples becomes a Map with again, hundred of KV pairs.

So for each map you currently have you’d create an ets table, where you’d store a row for each key/value pair: :ets.insert(table_vector_1, {key, float}). I’m not sure how quick inserts are in ets compared to maps, which you’d need to try out.

Once you have that you can do your calculation just like you did with maps, but instead of using Map.take(query_vector, Map.keys(vector)) or vector[k] you’d use :ets.first/:ets.next() and :ets.lookup.

Also as you’re doing sums here I’d not “filter” keys before iterating, but simply iterate over all keys and just return 0 if the key is not present in the compared vector.

Hmm that would mean…1M of tables in ETS? Would that be a problem actually?

Actually on my second solution I didn’t do any key filtering, where I just try to compute the dot product of say
query = %{"foo" => 0.5, "bar" => 0.25} and %{"foo" => 0.125, "baz" => 0.8} (one of the vector in stored_vectors). The iteration would be:

query["foo"] * (stored_vector["foo"] || 0 )= 0.5 * 0.125
query["bar"] * (stored_vector["bar"] || 0 )= 0.5 * 0

But its much more slower than filtering the keys beforehand and just do

query["foo"] * stored_vector["foo"] = 0.5 * 0.123

If I understand @steven7 correctly, he has 1M inputs which are also huge. Converting each of those inputs to ETS might blow up table space.

Using the table just as ephimeral throwaway conversion might cost to much of time during conversion from one to another.

As I’ve already mentioned in the chat yesterday, I’m still under the assumption that this should be one of the fastest possibilities (in the chat it sounded as if maps as input where given and unchangable):

m1 = %{…}
m2 = %{…}

Enum.reduce(m1, 0, fn {k, v}, sum -> sum + v * (m2[k] || 0) end)

This is optimised on the size of m1, so if m2 is the smaller map, just swap them around.

And I’ll stick to my opinion until I got proven otherwise by benchmarks with data sets of realistical sizes.

And if the input type is not actually fixed to maps but can be changed, and insertion/building time does not matter that much (or data is already sorted anyway), then a pre-sorted proplist like list might actually be the way to go and using algorithms for calculating intersections of sorted listsets to actually calculate the “product” in the accumulator instead of the intersection should be the way to go.

Is there a limit to ETS? I mean the data seems to fit into memory already and the references for the ets table should also not sweat with that number. But I’ve not have had to deal with such amounts of data as well.

1 Like

Well based on my test where the query vector is a Map with ~175 KV pair, and running it against 1M other similar Map with differing amount of KV pair the Enum.reduce solution takes about ~20s on my machine. If I filtered the keys beforehand (through Map.take) I get about ~5s improvements.

And I have also noticed the slowness is not coming from Enum.map |> Enum.sum, as in with or without Enum.sum the difference is quite negligible. Hence my focus now is to optimise the multiplication part.

Interesting thoughts on using a proplist…I probably need to do some read up on the topic beforehand.

Earlier today I found a web page stating the limit were 1400 tables, now I can’t find that page again… Its been the efficiency guide for ERTS 5.somewhat.

In the current efficiency guide this limit is not listed anymore.

I know otp switch from “something” to using references to identify tables in ets in one of the more recent versions. Maybe that change removed that limitation.

Computing the dot product for that many rows without native support will always be pretty slow. Have you looked at Matrex for computation?

3 Likes

Yes I did but the “slow” part isn’t the multiplication part, its kinda the “pre-processing” part. Because in my case my input is kinda array of vectors so

query_matrex = Matrex.new([ Map.values(query_vector) ])
Enum.filter(stored_vectors, fn -> vector 
 # This is slow since I have another O(n) case here. I had to build the "target" vector at runtime cause
 # I can't prebuild the target matrex and store it since it depends on the length of the query vector and the 
 # "keys" of query vector.
 target_matrex = Enum.map(query_vector, fn {k,v} -> vector[k] || 0 end)
 Matrex.dot_tn(query_matrex, target_matrex)
end)

I believe the 1400 ets tables limitation has been lifted in the latter erlang versions

http://erlang.org/doc/man/ets.html

The number of tables stored at one Erlang node used to be limited. This is no longer the case (except by memory usage). The previous default limit was about 1400 tables and could be increased by setting the environment variable ERL_MAX_ETS_TABLES or the command line option +e before starting the Erlang runtime system. This hard limit has been removed, but it is currently useful to set the ERL_MAX_ETS_TABLES anyway. It should be set to an approximate of the maximum amount of tables used. This since an internal table for named tables is sized using this value. If large amounts of named tables are used and ERL_MAX_ETS_TABLES hasn’t been increased, the performance of named table lookup will degrade.

3 Likes

My first question is where do these stored_vectors come from? Or more precisely, do you even need to store them in memory?

If you could instead process them directly as you read them (whether from a file, db, user input, or what not), you might be able to get some interesting savings. In particular, by not storing vectors into memory, and also by not creating intermediate maps and lists, you can reduce GC pressure significantly, and also stabilize the memory usage.

This approach might still take long, but since you’re processing while you’re reading the input, it can be significantly shorter, because you’ll replace time_to_load_a_vector + time_to_process_it with a single pass over the original input. In the worst case scenario, it would be similar to the original time_to_load_a_vector, so you could possible halve the total execution time.

This approach makes sense only if you don’t need stored vectors for anything else. If you do need to keep them around for other purposes, you can try iterating over the map with :maps.iterator. The idea is basically the same, you want to compute the result in a single pass through the input vector, without allocating a lot of memory.

Notice that you still need to keep the query_vector in memory (whether as a map or in ETS), since you need to repeatedly read from it.

In all these approaches (and even in your own attempts), I’d recommend using plain recursion. You do a lot of iterations with relatively simple processing, so Enum & friends might add a significant overhead.

The proposed approach also paves way to handle things in multiple processes. You could have one process which produces k-v pairs (by iterating the original source or the in-memory map), and another (or more of them) to look up the query vector and return the product. Not sure if this would help for such simple computation, but it’s worth trying out. If you go for this approach, I’d suggest avoiding Flow/GenStage. I presume you don’t particularly care about backpressure, and since processing of a single element is so simple (lookup and a product of two integers), any overhead of such abstractions might bring more harm than good.

Finally, if all else fails, you might consider implementing this piece of logic in another language (e.g. Rust or C). You do a lot of intensive CPU processing, and Elixir/Erlang don’t exactly shine here. IME (and I did my share of intensive processing), one can most often get acceptable results with a combination of proper algorithms and technical optimizations (such as the ones proposed here), but if it’s still not enough, then I don’t see other options.

Best of luck!

3 Likes

Also as a small addendum, you can try processing each vector in its own process. That change alone might give you a significant throughput improvement, even though processing of one vector would still take the same amount of time.

Thanks for the insightful response! The use case is, each stored_vectors actually represents one document/blog post/news, lemmatised and gone through the term-frequency (TF) L2 norms normalisation. Hence each array is a vectors of features and its TF score. I can’t process them as I read them as its will keep growing, imagine I have a 1M array now, a new “document” is being ingested, I need to know out of 1M existing documents which one is “similar” to this new one (through vector dot product, or cosine similarity), then I will append this new document to stored_vectors. With that being said, it needs to be fast as I am ingesting quite a lot of data every second.

To answer your question I don’t really need to store them in-memory, I could’ve store them in physical file but wouldn’t that be slower?

Trying to do this in Elixir is more of a…thought exercise/prototype on some of the Elixir limitations, I mean it’d be great if it worked out on :ets but it didn’t hence it derailed into storing it as Map in memory instead. On the side I am actually prototyping the application on Amazon Redshift and/or plain ol’ Postgres with Apache MADlib.

BTW interesting approach on using plain recursion, correct me if I am wrong, based on my use case mentioned above, is the approach still viable? Cause I need to know out of 1M vectors which one are “similar”, to use plain recursion (I am thinking tail-call optimised ones) does it mean traverse the 1M vectors through :maps.iterator and pass the result along?