Framework for evaluating ANNS algorithms on billion scale datasets.

Last update: Dec 24, 2022

Overview

Billion-Scale ANN

Install

The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Python as well but probably requires an updated requirements.txt on the host. (Suggestion: copy requirements.txt to requirements${PYTHON_VERSION}.txt and remove all fixed versions. requirements.txt has to be kept for the docker containers.)

Clone the repo.
Run pip install -r requirements.txt (Use requirements_py38.txt if you have Python 3.8.)
Install docker by following instructions here. You might also want to follow the post-install steps for running docker in non-root user mode.
Run python install.py to build all the libraries inside Docker containers.

Storing Data

The framework assumes that all data is stored in data/. Please use a symlink if your datasets and indices are supposed to be stored somewhere else. The location of the linked folder matters a great deal for SSD-based search performance in T2. A local SSD such as the one found on Azure Ls-series VMs is better than remote disks, even premium ones. See T1/T2 for more details.

Data sets

See http://big-ann-benchmarks.com/ for details on the different datasets.

Dataset Preparation

Before running experiments, datasets have to be downloaded. All preparation can be carried out by calling

python create_dataset.py --dataset [bigann-1B | deep-1B | text2image-1B | ssnpp-1B | msturing-1B | msspacev-1B]

Note that downloading the datasets can potentially take many hours.

For local testing, there exist smaller random datasets random-xs and random-range-xs. Furthermore, most datasets have 1M, 10M and 100M versions, run python create_dataset -h to get an overview.

Running the benchmark

Run python run.py --dataset $DS --algorithm $ALGO where DS is the dataset you are running on, and ALGO is the name of the algorithm. (Use python run.py --list-algorithms) to get an overview. python run.py -h provides you with further options.

The parameters used by the implementation to build and query the index can be found in algos.yaml.

Running the track 1 baseline

After running the installation, we can evaluate the baseline as follows.

for DS in bigann-1B  deep-1B  text2image-1B  ssnpp-1B  msturing-1B  msspacev-1B;
do
    python run.py --dataset $DS --algorithm faiss-t1;
done

On a 28-core Xeon E5-2690 v4 that provided 100MB/s downloads, carrying out the baseline experiments took roughly 7 days.

To evaluate the results, run

sudo chmod -R 777 results/
python data_export.py --output res.csv
python3.8 eval/show_operating_points.py --algorithm faiss-t1 --threshold 10000

Including your algorithm and Evaluating the Results

See Track T1/T2 for more details on evaluation for Tracks T1 and T2.

See Track T3 for more details on evaluation for Track T3.

Credits

This project is a version of ann-benchmarks by Erik Bernhardsson and contributors targetting billion-scale datasets.

Framework for evaluating ANNS algorithms on billion scale datasets.

Related tags

Overview

Billion-Scale ANN

Install

Storing Data

Data sets

Dataset Preparation

Running the benchmark

Running the track 1 baseline

Including your algorithm and Evaluating the Results

Credits

Owner

Harsha Vardhan Simhadri

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

PyTorch implementation of Pointnet2/Pointnet++

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

Code for the paper "SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness" (NeurIPS 2021)

🔥 Cannlytics-powered artificial intelligence 🤖

FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

In the case of your data having only 1 channel while want to use timm models

This repository implements variational graph auto encoder by Thomas Kipf.

A robotic arm that mimics hand movement through MediaPipe tracking.

Deeper DCGAN with AE stabilization

中文语音识别系列，读者可以借助它快速训练属于自己的中文语音识别模型，或直接使用预训练模型测试效果。

Pytorch implementation of MalConv

Official implementation of Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models at NeurIPS 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure