Hi I recently have the requirement to perform dot-product on two vectors in Elixir and I was wondering what would be the most efficient way. Imagine the following scenario,
Say I have ~1M vectors stored in memory, each with the following format [{"foo", 0.123}, {"bar", 0.777}, .........]
. I now have a query vector with the same format and I need to compute the dot product of the query vector and each of the ~1M stored vectors. By dot-product I mean, if both vectors contains the same string key, multiple their float value and sum it all up.
Now my most naive/straightforward solutions is something like, instead of storing them as array of tuples, I convert them into Map
and store them in memory as before, so my vector would become something like a hash-table %{"foo" => 0.123, "bar" => 0.777}
and then:
Enum.filter(stored_vectors, fn -> vector
# Since we only need to process the "overlapped keys"
query = Map.take(query_vector, Map.keys(vector))
Enum.map(query, fn {k, v} ->
v * vector[k]
end)
|> Enum.sum
|> Kernel.>(0.5)
end)
Now my issue is that Map.take
is extremely slow when I am querying against 1M vectors. But if I remove that code and just do multiplication anyway like this:
Enum.filter(stored_vectors, fn -> vector
Enum.map(query_vector, fn {k, v} ->
v * (vector[k] || 0)
end)
|> Enum.sum
|> Kernel.>(0.5)
end)
Its even slower. I am running out of options and was wondering how can I optimise my code.
P/S: I also tried one more thing like this but to no avail:
keys1 = query_vector |> Map.keys |> MapSet.new()
Enum.filter(stored_vectors, fn -> vector
overlapped_keys = MapSet.intersection(keys1, vector |> Map.keys |> MapSet.new())
Enum.map(overlapped_keys, fn k->
query_vector[k] * vector[k]
end)
|> Enum.sum
|> Kernel.>(0.5)
end)