I’m trying to port some code from python, to elixir. The python code generates tensors from embedding in BERT, then does some form of similarity comparison between them. From what i’ve found online, it looks like cosine similarity is the calculation I’m looking for, but I can’t quite understand it enough to implement it in Nx.
The formula is listed as A ⋅ B / ||A|| ||B||
. I have two tensors with with the shape #Nx.Tensor<f32[1][6][119547]...>
. So far this is all i’ve come up with:
for i <- 0..5, j <- 0..5 do
t1 = tensor1[0][i]
t2 = tensor2[0][j]
Nx.dot(t1, t2) / ???
end
I found another formula on the wikipedia page that’s numpy code, which says:
np.sum(a*b)/(np.sqrt(np.sum(a**2)) * np.sqrt(np.sum(b**2)))
I think converted to Nx
that’s:
defmodule CosSim do
import Nx.Defn
defn cosine_similarity(a, b) do
left = Nx.sqrt(Nx.sum(a**2))
right = Nx.sqrt(Nx.sum(b**2))
Nx.sum(a * b) / (left * right)
end
end
If I do a quick test:
a = Nx.tensor([1,2,3])
b = Nx.tensor([4,5,6])
CosSim.cosine_similarity(a, b)
#Nx.Tensor<
f32
EXLA.Backend<host:0, 0.528063503.4042653716.111775>
0.9746317863464355
>
If I try to validate it in python:
>>> a = np.matrix([1,2,3])
>>> b = np.matrix([4,5,6])
>>> np.sum(a*b)/(np.sqrt(np.sum(a**2)) * np.sqrt(np.sum(b**2)))
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
So something is off. Admittdly i’ve very new to any of this ML/Nx stuff, so maybe I’m way off, or maybe i’m close. Any tips?