I have used in production the folllowing formula for cosine similarity calculation:
(list_a and list_b are both the same length)
cosine_similarity(list_a, list_b) * :math.sqrt(length(list_a))
This is good because it ignores the scale of the attributes but the
:math.sqrt(...) part takes into account the number of common attributes. (first you want to extract ordered common attributes from pairs of elements)
This worked out so great actually I was surprised how good it was.
The library: https://github.com/preciz/similarity
If you have >10k lists with length > 200 and you need to calculate this frequently between all then this might not cut it. (I would probably just check the library how to do it and use the db to calculate it).
If you don’t know what is this for then here is an
Example pseudo code use case:
people |> map(pictures) |> map(image_labels) |> calculate_similarities |> save_to_db |> power_suggestions_based_on_similarities
Thanks for checking out! (I wrote this today quickly, any corrections welcome.)