Framework for evaluating ANNS algorithms on billion scale datasets.

Overview

Billion-Scale ANN

http://big-ann-benchmarks.com/

Install

The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Python as well but probably requires an updated requirements.txt on the host. (Suggestion: copy requirements.txt to requirements${PYTHON_VERSION}.txt and remove all fixed versions. requirements.txt has to be kept for the docker containers.)

  1. Clone the repo.
  2. Run pip install -r requirements.txt (Use requirements_py38.txt if you have Python 3.8.)
  3. Install docker by following instructions here. You might also want to follow the post-install steps for running docker in non-root user mode.
  4. Run python install.py to build all the libraries inside Docker containers.

Storing Data

The framework assumes that all data is stored in data/. Please use a symlink if your datasets and indices are supposed to be stored somewhere else. The location of the linked folder matters a great deal for SSD-based search performance in T2. A local SSD such as the one found on Azure Ls-series VMs is better than remote disks, even premium ones. See T1/T2 for more details.

Data sets

See http://big-ann-benchmarks.com/ for details on the different datasets.

Dataset Preparation

Before running experiments, datasets have to be downloaded. All preparation can be carried out by calling

python create_dataset.py --dataset [bigann-1B | deep-1B | text2image-1B | ssnpp-1B | msturing-1B | msspacev-1B]

Note that downloading the datasets can potentially take many hours.

For local testing, there exist smaller random datasets random-xs and random-range-xs. Furthermore, most datasets have 1M, 10M and 100M versions, run python create_dataset -h to get an overview.

Running the benchmark

Run python run.py --dataset $DS --algorithm $ALGO where DS is the dataset you are running on, and ALGO is the name of the algorithm. (Use python run.py --list-algorithms) to get an overview. python run.py -h provides you with further options.

The parameters used by the implementation to build and query the index can be found in algos.yaml.

Running the track 1 baseline

After running the installation, we can evaluate the baseline as follows.

for DS in bigann-1B  deep-1B  text2image-1B  ssnpp-1B  msturing-1B  msspacev-1B;
do
    python run.py --dataset $DS --algorithm faiss-t1;
done

On a 28-core Xeon E5-2690 v4 that provided 100MB/s downloads, carrying out the baseline experiments took roughly 7 days.

To evaluate the results, run

sudo chmod -R 777 results/
python data_export.py --output res.csv
python3.8 eval/show_operating_points.py --algorithm faiss-t1 --threshold 10000

Including your algorithm and Evaluating the Results

See Track T1/T2 for more details on evaluation for Tracks T1 and T2.

See Track T3 for more details on evaluation for Track T3.

Credits

This project is a version of ann-benchmarks by Erik Bernhardsson and contributors targetting billion-scale datasets.

Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Description The source code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chin

Zhengxiang Wang 3 Jun 28, 2022
A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

22 Nov 23, 2022
AutoML library for deep learning

Official Website: autokeras.com AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras

Keras 8.7k Jan 08, 2023
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

Jia Research Lab 137 Dec 14, 2022
details on efforts to dump the Watermelon Games Paprium cart

Reminder, if you like these repos, fork them so they don't disappear https://github.com/ArcadeHustle/WatermelonPapriumDump/fork Big thanks to Fonzie f

Hustle Arcade 29 Dec 11, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
A PyTorch-based library for fast prototyping and sharing of deep neural network models.

A PyTorch-based library for fast prototyping and sharing of deep neural network models.

78 Jan 03, 2023
Improving Compound Activity Classification via Deep Transfer and Representation Learning

Improving Compound Activity Classification via Deep Transfer and Representation Learning This repository is the official implementation of Improving C

NingLab 2 Nov 24, 2021
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Visualizing Yolov5's layers using GradCam

YOLO-V5 GRADCAM I constantly desired to know to which part of an object the object-detection models pay more attention. So I searched for it, but I di

Pooya Mohammadi Kazaj 200 Jan 01, 2023
TigerLily: Finding drug interactions in silico with the Graph.

Drug Interaction Prediction with Tigerlily Documentation | Example Notebook | Youtube Video | Project Report Tigerlily is a TigerGraph based system de

Benedek Rozemberczki 91 Dec 30, 2022
This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

Xiaotao Gu 146 Dec 22, 2022
Practical tutorials and labs for TensorFlow used by Nvidia, FFN, CNN, RNN, Kaggle, AE

TensorFlow Tutorial - used by Nvidia Learn TensorFlow from scratch by examples and visualizations with interactive jupyter notebooks. Learn to compete

Alexander R Johansen 1.9k Dec 19, 2022
Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

ZhenchaoTang 14 Oct 21, 2022
Deep Learning Training Scripts With Python

Deep Learning Training Scripts DNN Frameworks Caffe PyTorch Tensorflow CNN Models VGG ResNet DenseNet Inception Language Modeling GatedCNN-LM Attentio

Multicore Computing Research Lab 16 Dec 15, 2022
pytorch implementation of fast-neural-style

fast-neural-style 🌇 🚀 NOTICE: This codebase is no longer maintained, please use the codebase from pytorch examples repository available at pytorch/e

Abhishek Kadian 405 Dec 15, 2022
A Python 3 package for state-of-the-art statistical dimension reduction methods

direpack: a Python 3 library for state-of-the-art statistical dimension reduction techniques This package delivers a scikit-learn compatible Python 3

Sven Serneels 32 Dec 14, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

Code For the Paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" Author: Robert Bamler Date: 22 D

4 Nov 02, 2022