GluonMM is a library of transformer models for computer vision and multi-modality research

Last update: Dec 02, 2022

Overview

GluonMM

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

Install

First, clone the repository locally,

git clone https://github.com/amazon-research/gluonmm.git

Then install dependencies,

conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python

# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.

Model zoo

Image classification

Video action recognition

VidTr

Usage

For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Python calculations for the position of the sun and moon.

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Linear Variational State Space Filters

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Code for paper: Towards Tokenized Human Dynamics Representation

This is a custom made virus code in python, using tkinter module.

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

masscan + nmap + Finger

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Python calculations for the position of the sun and moon.

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Linear Variational State Space Filters

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Code for paper: Towards Tokenized Human Dynamics Representation

This is a custom made virus code in python, using tkinter module.

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

masscan + nmap + Finger

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch