[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Overview

Enjoy-Hamburger 🍔

Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021)

Under construction.

Introduction

This repo provides the official implementation of Hamburger for further research. We sincerely hope that this paper can bring you inspiration about the Attention Mechanism, especially how the low-rankness and the optimization-driven method can help model the so-called Global Information in deep learning.

We model the global context issue as a low-rank completion problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs.

contents

We are working on some exciting topics. Please wait for our new papers!

Enjoy Hamburger, please!

Organization

This section introduces the organization of this repo.

We strongly recommend the readers to read the blog (incoming soon) as a supplement to the paper!

  • blog.
    • Some random thoughts about Hamburger and beyond.
    • Possible directions based on Hamburger.
    • FAQ.
  • seg.
    • We provide the PyTorch implementation of Hamburger (V1) in the paper and an enhanced version (V2) flavored with Cheese. Some experimental features are included in V2+.
    • We release the codebase for systematical research on the PASCAL VOC dataset, including the two-stage training on the trainaug and trainval datasets and the MSFlip test.
    • We offer three checkpoints of HamNet, in which one is 85.90+ with the test server link, while the other two are 85.80+ with the test server link 1 and link 2. You can reproduce the test results using the checkpoints combined with the MSFlip test code.
    • Statistics about HamNet that might ease further research.
  • gan.
    • Official implementation of Hamburger in TensorFlow.
    • Data preprocessing code for using ImageNet in tensorflow-datasets. (Possibly useful if you hope to run the JAX code of BYOL or other ImageNet training code with the Cloud TPUs.)
    • Training and evaluation protocol of HamGAN on the ImageNet.
    • Checkpoints of HamGAN-strong and HamGAN-baby.

TODO:

  • README doc for HamGAN.
  • PyTorch Hamburger with less encapsulation.
  • Suggestions for using and further developing Hamburger.
  • Blog in both English and Chinese.
  • We also consider adding a collection of popular context modules to this repo. It depends on the time. No Guarantee. Perhaps GuGu 🕊️ (which means standing someone up).

Citation

If you find our work interesting or helpful to your research, please consider citing Hamburger. :)

@inproceedings{
    ham,
    title={Is Attention Better Than Matrix Decomposition?},
    author={Zhengyang Geng and Meng-Hao Guo and Hongxu Chen and Xia Li and Ke Wei and Zhouchen Lin},
    booktitle={International Conference on Learning Representations},
    year={2021},
}

Contact

Feel free to contact me if you have additional questions or have interests in collaboration. Please drop me an email at [email protected]. Find me at Twitter. Thank you!

Response to recent emails may be slightly delayed to March 26th due to the deadlines of ICLR. I feel sorry, but people are always deadline-driven. QAQ

Acknowledgments

Our research is supported with Cloud TPUs from Google's Tensorflow Research Cloud (TFRC). Nice and joyful experience with the TFRC program. Thank you!

We would like to sincerely thank EMANet, PyTorch-Encoding, YLG, and TF-GAN for their awesome released code.

Owner
Gsunshine
Gsunshine
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Momin Haider 0 Jan 02, 2022
Official Pytorch implementation of MixMo framework

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Official PyTorch implementation of the MixMo framework | paper | docs Alexandr

79 Nov 07, 2022
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

Scott Fujimoto 193 Dec 23, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p

Microsoft 2.6k Jan 06, 2023
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
for taichi voxel-challange event

Taichi Voxel Challenge Figure: result of python3 example6.py. Please replace the image above (demo.jpg) with yours, so that other people can immediate

Liming Xu 20 Nov 26, 2022
MultiLexNorm 2021 competition system from ÚFAL

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5 David Samuel & Milan Straka Charles University Faculty of

ÚFAL 13 Jun 28, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and

Arun S. Maiya 1.1k Jan 02, 2023
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Gu

Chen Liang 23 Nov 07, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 07, 2023
Pointer-generator - Code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks

Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have u

Abi See 2.1k Jan 04, 2023
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 03, 2023
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Source code for "Roto-translated Local Coordinate Framesfor Interacting Dynamical Systems"

Roto-translated Local Coordinate Frames for Interacting Dynamical Systems Source code for Roto-translated Local Coordinate Frames for Interacting Dyna

Miltiadis Kofinas 19 Nov 27, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the momen

ChemEngAI 40 Dec 27, 2022