10x faster matrix and vector operations

Overview

Bolt

Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.

If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.

Bolt also has theoretical guarantees bounding the errors in its approximations.

EDIT: this repo now also features the source code for MADDNESS, our shiny new algorithm for approximate matrix multiplication. MADDNESS has no Python wrapper yet, and is referred to as "mithral" in the source code. Name changed because apparently I'm the only who gets Lord of the Rings references. MADDNESS runs ridiculously fast and, under reasonable assumptions, requires zero multiply-adds. Realistically, it'll be most useful for speeding up neural net inference on CPUs, but it'll take another couple papers to get it there; we need to generalize it to convolution and write the CUDA kernels to allow GPU training.

NOTE: All below code refers to the Python wrapper for Bolt and has nothing to do with MADDNESS. It also seems to be no longer building for many people. If you want to use MADDNESS, see the Python Implementation driven by amm_main.py or C++ implementation. All code is ugly, but Python code should be pretty easy to add new AMM methods/variations to.

Installing

Python

  $ brew install swig  # for wrapping C++; use apt-get, yum, etc, if not OS X
  $ pip install numpy  # bolt installation needs numpy already present
  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt && python setup.py install
  $ pytest tests/  # optionally, run the tests

If you run into any problems, please don't hesitate to mention it in the Python build problems issue.

C++

Install Bazel, Google's open-source build system. Then

  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt/cpp && bazel run :main

The bazel run command will build the project and run the tests and benchmarks.

If you want to integrate Bolt with another C++ project, include cpp/src/include/public.hpp and add the remaining files under cpp/src to your builds. You should let me know if you're interested in doing such an integration because I'm hoping to see Bolt become part of many libraries and thus would be happy to help you.

Notes

Bolt currently only supports machines with AVX2 instructions, which basically means x86 machines from fall 2013 or later. Contributions for ARM support are welcome. Also note that the Bolt Python wrapper is currently configured to require Clang, since GCC apparently runs into issues.

How does it work?

Bolt is based on vector quantization. For details, see the Bolt paper or slides.

Benchmarks

Bolt includes a thorough set of speed and accuracy benchmarks. See the experiments/ directory. This is also what you want if you want to reproduce the results in the paper.

Note that all of the timing results use the raw C++ implementation. At present, the Python wrapper is slightly slower due to Python overhead. If you're interested in having a full-speed wrapper, let me know and I'll allocate time to making this happen.

Basic usage

X, queries = some N x D array, some iterable of length D arrays

# these are approximately equal (though the latter are shifted and scaled)
enc = bolt.Encoder(reduction='dot').fit(X)
[np.dot(X, q) for q in queries]
[enc.transform(q) for q in queries]

# same for these
enc = bolt.Encoder(reduction='l2').fit(X)
[np.sum((X - q) * (X - q), axis=1) for q in queries]
[enc.transform(q) for q in queries]

# but enc.transform() is 10x faster or more

Example: Matrix-vector multiplies

import bolt
import numpy as np
from scipy.stats import pearsonr as corr
from sklearn.datasets import load_digits
import timeit

# for simplicity, use the sklearn digits dataset; we'll split
# it into a matrix X and a set of queries Q
X, _ = load_digits(return_X_y=True)
nqueries = 20
X, Q = X[:-nqueries], X[-nqueries:]

enc = bolt.Encoder(reduction='dot', accuracy='lowest') # can tweak acc vs speed
enc.fit(X)

dot_corrs = np.empty(nqueries)
for i, q in enumerate(Q):
    dots_true = np.dot(X, q)
    dots_bolt = enc.transform(q)
    dot_corrs[i] = corr(dots_true, dots_bolt)[0]

# dot products closely preserved despite compression
print "dot product correlation: {} +/- {}".format(
    np.mean(dot_corrs), np.std(dot_corrs))  # > .97

# massive space savings
print(X.nbytes)  # 1777 rows * 64 cols * 8B = 909KB
print(enc.nbytes)  # 1777 * 2B = 3.55KB

# massive time savings (~10x here, but often >100x on larger
# datasets with less Python overhead; see the paper)
t_np = timeit.Timer(
    lambda: [np.dot(X, q) for q in Q]).timeit(5)        # ~9ms
t_bolt = timeit.Timer(
    lambda: [enc.transform(q) for q in Q]).timeit(5)    # ~800us
print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format(
    t_np * 1000, t_bolt * 1000)

# can get output without offset/scaling if needed
dots_bolt = [enc.transform(q, unquantize=True) for q in Q]

Example: K-Nearest Neighbor / Maximum Inner Product Search

# search using squared Euclidean distances
# (still using the Digits dataset from above)
enc = bolt.Encoder('l2', accuracy='high').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

# search using dot product (maximum inner product search)
enc = bolt.Encoder('dot', accuracy='medium').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

Miscellaneous

Bolt stands for "Based On Lookup Tables". Feel free to use this exciting fact at parties.

Owner
Machine learning PhD Student at MIT. I build fast machine learning algorithms.
Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

ood-text-emnlp Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them" Files fine_tune.py is used to finetune the GPT-2 mo

Udit Arora 19 Oct 28, 2022
Tool for installing and updating MiSTer cores and other files

MiSTer Downloader This tool installs and updates all the cores and other extra files for your MiSTer. It also updates the menu core, the MiSTer firmwa

72 Dec 24, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
Implementation of U-Net and SegNet for building segmentation

Specialized project Created by Katrine Nguyen and Martin Wangen-Eriksen as a part of our specialized project at Norwegian University of Science and Te

Martin.w-e 3 Dec 07, 2022
Graduation Project

Gesture-Detection-and-Depth-Estimation This is my graduation project. (1) In this project, I use the YOLOv3 object detection model to detect gesture i

ChaosAT 1 Nov 23, 2021
A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

large-scale-ITE-UM-benchmark This repository contains code and data to reproduce the results of the paper "A Large Scale Benchmark for Individual Trea

10 Nov 19, 2022
Codeflare - Scale complex AI/ML pipelines anywhere

Scale complex AI/ML pipelines anywhere CodeFlare is a framework to simplify the integration, scaling and acceleration of complex multi-step analytics

CodeFlare 169 Nov 29, 2022
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be

196 Jan 08, 2023
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

Paul Liang 42 Jan 03, 2023
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 04, 2022
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization This repository contains the source code for the paper (link wi

Rakuten Group, Inc. 0 Nov 19, 2021
A Real-World Benchmark for Reinforcement Learning based Recommender System

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System RL4RS is a real-world deep reinforcement learning recommender system

121 Dec 01, 2022
LSTM model trained on a small dataset of 3000 names written in PyTorch

LSTM model trained on a small dataset of 3000 names. Model generates names from model by selecting one out of top 3 letters suggested by model at a time until an EOS (End Of Sentence) character is no

Sahil Lamba 1 Dec 20, 2021
Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

Follow the development of our desktop client here Paaster Paaster is a secure by default end-to-end encrypted pastebin built with the objective of sim

Ward 211 Dec 25, 2022
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 06, 2023
Convert onnx models to pytorch.

onnx2torch onnx2torch is an ONNX to PyTorch converter. Our converter: Is easy to use – Convert the ONNX model with the function call convert; Is easy

ENOT 264 Dec 30, 2022
Implementing yolov4 target detection and tracking based on nao robot

Implementing yolov4 target detection and tracking based on nao robot

6 Apr 19, 2022
Forecasting directional movements of stock prices for intraday trading using LSTM and random forest

Forecasting directional movements of stock-prices for intraday trading using LSTM and random-forest https://arxiv.org/abs/2004.10178 Pushpendu Ghosh,

Pushpendu Ghosh 270 Dec 24, 2022