TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Last update: Oct 26, 2022

Overview

Parameterization of Hypercomplex Multiplications (PHM)

This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication) layers and PHM-Transformers in the paper Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters at ICLR 2021.

Installation

One may install the following libraries before running our code:

tensorflow-gpu (1.14.0)
tensor2tensor (1.14.0)

Usage

The usage of this repository follows the original tensor2tensor repository (e.g., t2t-datagen, t2t-trainer, t2t-avg-all, followed by t2t-decoder). It helps to gain familiarity on tensor2tensor before attempting to run our code. Specifically, setting --t2t_usr_dir=./Parameterization-of-Hypercomplex-Multiplications will allow tensor2tensor to register PHM-Transformers.

Training

For example, to evaluate PHM-Transformer (n=4) on the En-Vi machine translation task (t2t-datagen --problem=translate_envi_iwslt32k), one may set the following flags when training:

t2t-trainer \
--problem=translate_envi_iwslt32k \
--model=light_transformer \
--hparams_set=light_transformer_base_single_gpu \
--hparams="light_mode='random',hidden_size=512,factor=4" \
--train_steps=50000

where light_transformer with light_mode='random' is the alias of the PHM-Transformer in our implementation.

Aggretating Checkpoints

After training, the latest 8 checkpoints are averaged:

t2t-avg-all --model_dir $TRAIN_DIR --output_dir $AVG_DIR --n 8

where $TRAIN_DIR and $AVG_DIR need to be specified by users.

Testing

To decode the target sequence, one has to additionally set the decode_hparams as follows:

t2t-decoder \
--decode_hparams="beam_size=5,alpha=0.6"

Then t2t-bleu is invoked for calculating the BLEU.

PHM Implementations

PHM is implemented with operations in make_random_mul and random_ffn, which are mathematically equivalent to sum of Kronecker products.

Among works that use PHM, some have offered alternative PHM implementations:

Citation

If you find this repository helpful, please cite our paper:

@inproceedings{zhang2021beyond,
  title={Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters},
  author={Zhang, Aston and Tay, Yi and Zhang, Shuai and Chan, Alvin and Luu, Anh Tuan and Hui, ‪Siu Cheung and Fu, Jie},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Related tags

Overview

Parameterization of Hypercomplex Multiplications (PHM)

Installation

Usage

Training

Aggretating Checkpoints

Testing

PHM Implementations

Citation

Owner

Aston Zhang

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Tensorflow 2.x implementation of Vision-Transformer model

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Experiments for distributed optimization algorithms

A library for graph deep learning research

Markov Attention Models

This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger Bands to create a projected active liquidity range.

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

NER for Indian languages

Official repository for ABC-GAN

The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

The Environment I built to study Reinforcement Learning + Pokemon Showdown

Recurrent Scale Approximation (RSA) for Object Detection

NEATEST: Evolving Neural Networks Through Augmenting Topologies with Evolution Strategy Training

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

AdaDM: Enabling Normalization for Image Super-Resolution

GRF: Learning a General Radiance Field for 3D Representation and Rendering