Is cosine distance wrong?

woohaaha · October 11, 2023, 11:44am

I’m using embeddings with pgvector but when using cosine distance the results weren’t matching expectations so I decided to make sure my test example would work with Scholar, but the results are off in Scholar as well which implies to me that maybe I have the wrong expectations, and yet… after looking it up, I don’t think I’m misunderstanding.

Most simple example:

Scholar.Metrics.Distance.cosine(Nx.tensor([1,1]), Nx.tensor([1,1]))

#Nx.Tensor<
  f32
  5.960464477539063e-8
>

If two tensors are “pointing” in the same direction the cosine distance should be 1, but this is pretty much 0 implying that the tensors are “orthogonal” (not similar at all).

What am I doing wrong?

Thank you

Update 1

Do I need to subtract from 1?

Nx.subtract(Nx.tensor(1), Scholar.Metrics.Distance.cosine(Nx.tensor([1,2]), Nx.tensor([1,2])))

#Nx.Tensor<
  f32
  0.9999999403953552
>

polvalente · October 11, 2023, 12:02pm

Scholar uses the cosine distance. You want the cosine similarity, which is 1 - distance

woohaaha · October 12, 2023, 3:37am

Thank you