CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Last update: Jan 04, 2023

Related tags

Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas Fürst^{* 1}, Elisabeth Rumetshofer^{* 1}, Viet Tran¹, Hubert Ramsauer¹, Fei Tang³, Johannes Lehner¹, David Kreil², Michael Kopp², Günter Klambauer¹, Angela Bitto-Nemling¹, Sepp Hochreiter^{1 2}

¹ ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
² Institute of Advanced Research in Artificial Intelligence (IARAI)
³ HERE Technologies
^* Equal contribution

Detailed blog post on this paper at this link.

The full paper is available here.

Implementation of CLOOB

This repository contains the implemenation of CLOOB used to obtain the results reported in the paper. The implementation is based on OpenCLIP, an open source implementation of OpenAI's CLIP.

Setup

We provide an 'environment.yml' file to set up a conda environment with all required packages. Run the following command to clone the repository and create the environment.

# Clone repository and swtich into the directory
git clone https://github.com/ml-jku/cloob
cd cloob

# Create the environment and activate it
conda env create --file environment.yml
conda activate cloob

# Additionally, webdataset needs to be installed from git repo for pre-training on YFCC 
pip install git+https://github.com/tmbdev/webdataset.git

# Add the directory to the PYTHONPATH environment variable
export PYTHONPATH="$PYTHONPATH:$PWD/src"

Data

For pre-training we use the two datasets supported by OpenCLIP, namely Conceptual Captions and YFCC.

Conceptual Captions

OpenCLIP already provides a script to download and prepare the Conceptual Captions dataset, which contains 2.89M training images and 13k validation images. First, download the Conceptual Captions URLs and then run the script gather_cc.py.

python3 src/data/gather_cc.py path/to/Train_GCC-training.tsv path/to/Validation_GCC-1.1.0-Validation.tsv

YFCC

We use the same subset of ~15M images from the YFCC100M dataset as CLIP. They provide a list of (line number, photo identifier, photo hash) of each image contained in this subset here.

For more information see YFCC100m Subset on OpenAI's github.

Downstream Tasks

In the paper we report results on several downstream tasks. Except for ImageNet we provide links to already pre-processed versions (where necessary) of the respective test set.

Dataset	Description	Official	Processed
Birdsnap	This dataset contains images of North American bird species, however our dataset is smaller than reported in CLIP as some samples are no longer available.	Link	Link
Country211	This dataset was published in CLIP and is a small subset of the YFCC100m dataset. It consists of photos that can be assigned to 211 countries via GPS coordinates. For each country 200 photos are sampled for the training set and 100 for testing.	Link	Link
Flowers102	Images of 102 flower categories commonly occuring in the United Kingdom were collected. Several classes are very similar and there is a large variation in scale, pose and lighting.	Link	Link
GTSRB	This dataset was released for a challenge held at the IJCNN 2011. The dataset contains images of german traffic signs from more than 40 classes.	Link	Link
Stanford Cars	This dataset contains images of 196 car models at the level of make, model and year (e.g. Tesla Model S Sedan 2012).	Link	Link
UCF101	The dataset has been created by extracting the middle frame from each video.	Link	Link
ImageNet	This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images.	Link	-
ImageNet v2	The ImageNetV2 dataset contains new test data for the ImageNet benchmark.	Link	-

Usage

In the following there is an example command for pretraining on CC with an effective batch size of 512 when used on 4 GPUs.

/conceptual_captions/Train-GCC-training_output.csv" \ --val-data="

/conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv" \ --path-data="

/conceptual_captions" \ --imagenet-val="

/imagenet/val" \ --warmup 20000 \ --batch-size=128 \ --lr=1e-3 \ --wd=0.1 \ --lr-scheduler="cosine-restarts" \ --restart-cycles=10 \ --epochs=70 \ --method="cloob" \ --init-inv-tau=30 \ --init-scale-hopfield=8 \ --workers=8 \ --model="RN50" \ --dist-url="tcp://127.0.0.1:6100" \ --batch-size-eval=512 ">
python -u src/training/main.py \
--train-data="
       
        /conceptual_captions/Train-GCC-training_output.csv
        "
        \
--val-data="
       
        /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv
        "
        \
--path-data="
       
        /conceptual_captions
        "
        \
--imagenet-val="
       
        /imagenet/val
        "
        \
--warmup 20000 \
--batch-size=128 \
--lr=1e-3 \
--wd=0.1 \
--lr-scheduler="cosine-restarts" \
--restart-cycles=10 \
--epochs=70 \
--method="cloob" \
--init-inv-tau=30 \
--init-scale-hopfield=8 \
--workers=8 \
--model="RN50" \
--dist-url="tcp://127.0.0.1:6100" \
--batch-size-eval=512

Zeroshot evaluation of downstream tasks

We provide a Jupyter notebook to perform zeroshot evaluation with a trained model.

LICENSE

MIT LICENSE

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Related tags

Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Detailed blog post on this paper at this link.

The full paper is available here.

Implementation of CLOOB

Setup

Data

Conceptual Captions

YFCC

Downstream Tasks

Usage

Zeroshot evaluation of downstream tasks

LICENSE

Owner

Institute for Machine Learning, Johannes Kepler University Linz

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

Diverse Branch Block: Building a Convolution as an Inception-like Unit

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

A copy of Ares that costs 30 fucking dollars.

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

🐸STT integration examples

Catalyst.Detection

Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

Tf alloc - Simplication of GPU allocation for Tensorflow2

Repository for training material for the 2022 SDSC HPC/CI User Training Course

Using pretrained GROVER to extract the atomic fingerprints from molecule

[ICLR'19] Trellis Networks for Sequence Modeling

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Related tags

Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Detailed blog post on this paper at this link.

The full paper is available here.

Implementation of CLOOB

Setup

Data

Conceptual Captions

YFCC

Downstream Tasks

Usage

Zeroshot evaluation of downstream tasks

LICENSE

Owner

Institute for Machine Learning, Johannes Kepler University Linz

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

Diverse Branch Block: Building a Convolution as an Inception-like Unit

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

A copy of Ares that costs 30 fucking dollars.

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

🐸STT integration examples

Catalyst.Detection

Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

Tf alloc - Simplication of GPU allocation for Tensorflow2

Repository for training material for the 2022 SDSC HPC/CI User Training Course

Using pretrained GROVER to extract the atomic fingerprints from molecule

[ICLR'19] Trellis Networks for Sequence Modeling

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.