Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Last update: Jan 06, 2023

Related tags

Deep Learning transformer-xl

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

StarGAN2 for practice

(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

This implements one of result networks from Large-scale evolution of image classifiers

Improving Factual Consistency of Abstractive Text Summarization

3D-aware GANs based on NeRF (arXiv).

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

Wenet STT Python

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

An Open Source Machine Learning Framework for Everyone

MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

phylotorch-bito is a package providing an interface to BITO for phylotorch

Example how to deploy deep learning model with aiohttp.

Robust & Reliable Route Recommendation on Road Networks

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

A Tensorflow based library for Time Series Modelling with Gaussian Processes

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Unrolled Variational Bayesian Algorithm for Image Blind Deconvolution

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

基于Paddlepaddle复现yolov5，支持PaddleDetection接口