Skip to content

I think DEEP should be Euclidean distance? #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mageirakos opened this issue Mar 2, 2025 · 0 comments
Open

I think DEEP should be Euclidean distance? #574

mageirakos opened this issue Mar 2, 2025 · 0 comments

Comments

@mageirakos
Copy link

mageirakos commented Mar 2, 2025

Thank you for providing a common format and benchmark suite for many standard datasets.

Issue:

I believe the original DEEP dataset is using Euclidean distance, not Angular as you have it.
Since, the vectors are l2-normalized, the two distances are highly correlated but not the same, so you might not notice immediately from QPS-Recall.

The only reason I am not certain and have a question mark in the title, is that based on #145, your download source is different and on another format from the following sources (.fvecs vs .ibin).

Sources:

I'm looking at big-ann-benchmarks regarding this issue, since the author of the original paper for DEEP is listed one of the organizers of the original '21 challenge (Artem Babenko). I've also consistently seen deep mentioned for euclidean distance on research papers, which makes sense as, to the best of my knowledge, that's more common for images, and IP/angular is more common for text data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant