Scaling Vision with Sparse Mixture of Experts

This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on ImageNet-21k, reproducing the results presented in the paper:

Scaling Vision with Sparse Mixture of Experts, by Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby.

We will soon provide a colab analysing one of the models that we have released, as well as "config" files to train from scratch and fine-tune checkpoints. Stay tuned.

Installation

Simply clone this repository.

The file requirements.txt contains the requirements that can be installed via PyPi. However, we recommend installing jax, flax and optax directly from GitHub, since we use some of the latest features that are not part of any release yet.

In addition, you also have to clone the Vision Transformer repository, since we use some parts of it.

If you want to use RandAugment to train models (which we recommend if you train on ImageNet-21k or ILSVRC2012 from scratch), you must also clone the Cloud TPU repository, and name it cloud_tpu.

Checkpoints

We release the checkpoints containing the weights of some models that we trained on ImageNet (either ILSVRC2012 or ImageNet-21k). All checkpoints contain an index file (with .index extension) and one or multiple data files ( with extension .data-nnnnn-of-NNNNN, called shards). In the following list, we indicate only the prefix of each checkpoint. We recommend using gsutil to obtain the full list of files, download them, etc.

V-MoE S/32, 8 experts on the last two odd blocks, trained from scratch on ILSVRC2012 with RandAugment: gs://vmoe_checkpoints/vmoe_s32_last2_ilsvrc2012_randaug_medium.
V-MoE B/16, 8 experts on every odd block, trained from scratch on ImageNet-21k with RandAugment: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong.
- Fine-tuned on ILSVRC2012: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong_ft_ilsvrc2012

Disclaimers

This is not an officially supported Google product.

Scaling Vision with Sparse Mixture of Experts

Related tags

Overview

Scaling Vision with Sparse Mixture of Experts

Installation

Checkpoints

Disclaimers

Owner

Google Research

kullanışlı ve işinizi kolaylaştıracak bir araç

Everything about being a TA for ITP/AP course!

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

duralava is a neural network which can simulate a lava lamp in an infinite loop.

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Character Controllers using Motion VAEs

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

Multi-layer convolutional LSTM with Pytorch

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

Crosslingual Segmental Language Model

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf