ZeroVL - The official implementation of ZeroVL

Last update: Nov 04, 2022

Related tags

Overview

This repository contains source code necessary to reproduce the results presented in the paper ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources.

Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) require a tremendous amount of data and computational resources (e.g., billion-level web data and hundreds of GPUs), which prevent researchers with limited resources from reproduction and further exploration. To this end, we provide a comprehensive training guidance, which allows us to conduct dual-encoder multi-modal representation alignment with limited resources. Meanwhile, we provide a reproducible strong baseline of competitive results, namely ZeroVL, with publicly accessible academic datasets and a popular experimental environment.

Performance

Image-text retreival RSUM scores on MSCOCO and Flickr30K datasets:

method	computation	data	COCO(zs.)	COCO(ft.)	F30K(zs.)	F30K(ft.)
CLIP	256 V100	400M	400.2	-	540.6	-
ALIGN	1024 TPUv3	1800M	425.3	500.4	553.3	576.0
baseline	8 V100	14.2M	363.5	471.9	476.8	553.0
ZeroVL	8 V100	14.2M	425.0	485.0	536.2	561.6
ZeroVL	8 V100	100M	442.1	500.5	546.5	573.6

zs.: zero-shot setting, ft.: fine-tuned setting.

Installation

Requirements:

Python 3.7
Pytorch 1.8.1
torchvision 0.9.1
cuda 11.1

Install requirements:

pip3 install -r requirements.txt

Getting Started

Check GETTING_STARTED.md for codebase usage.

Model Zoo

We will release pre-trained models soon.

Citing ZeroVL

If you use ZeroVL in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{cui2021zerovl,
  title={ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources},
  author={Cui, Quan and Zhou, Boyan and Guo, Yu and Yin, Weidong and Wu, Hao and Yoshie, Osamu},
  journal={arXiv preprint arXiv:2112.09331},
  year={2021}
}

License

ZeroVL is released under the MIT license. See LICENSE for details.

ZeroVL - The official implementation of ZeroVL

Related tags

Overview

Performance

Installation

Getting Started

Model Zoo

Citing ZeroVL

License

Owner

The repository contains source code and models to use PixelNet architecture used for various pixel-level tasks. More details can be accessed at .

A PyTorch port of the Neural 3D Mesh Renderer

Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

The AWS Certified SysOps Administrator

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

Weakly Supervised Learning of Rigid 3D Scene Flow

Unofficial JAX implementations of Deep Learning models

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

OpenAi's gym environment wrapper to vectorize them with Ray

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

OpenMMLab Image Classification Toolbox and Benchmark

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

A check for whether the dependency jobs are all green.

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.