"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Last update: Nov 16, 2022

Related tags

Text Data & NLP transformers-arithmetic

Overview

transformers-arithmetic

This repository contains the code to reproduce the experiments from the paper:

Nogueira, Jiang, Lin "Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

First, install the required packages:

pip install -r requirements.txt

The command below trains and evaluates a T5-base model on the task of adding up to 15-digits:

python main.py \
    --output_dir=. \
    --model_name_or_path=t5-base \
    --operation=addition \
    --orthography=10ebased \
    --balance_train \
    --balance_val \
    --train_size=100000 \
    --val_size=10000 \
    --test_size=10000 \
    --min_digits_train=2 \
    --max_digits_train=15 \
    --min_digits_test=2 \
    --max_digits_test=15 \
    --base_number=10 \
    --seed=1 \
    --train_batch_size=4 \
    --accumulate_grad_batches=32 \
    --val_batch_size=32 \
    --max_seq_length=512 \
    --num_workers=4 \
    --gpus=1 \
    --optimizer=AdamW \
    --lr=3e-4 \
    --weight_decay=5e-5 \
    --scheduler=StepLR \
    --t_0=2 \
    --t_mult=2 \
    --gamma=1.0 \
    --step_size=1000 \
    --max_epochs=20 \
    --check_val_every_n_epoch=2 \
    --amp_level=O0 \
    --precision=32 \
    --gradient_clip_val=1.0

This training should take 10 hours on a V100 GPU.

The exact match on the test set should be 1:

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_exact_match': 1.0000}
--------------------------------------------------------------------------------

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Related tags

Overview

transformers-arithmetic

Owner

Castorini

Official PyTorch implementation of SegFormer

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

Share constant definitions between programming languages and make your constants constant again

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Need: Image Search With Python

Transformers implementation for Fall 2021 Clinic

Longformer: The Long-Document Transformer

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

IEEEXtreme15.0 Questions And Answers

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

MicBot - MicBot uses Google Translate to speak everyone's chat messages

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

A text augmentation tool for named entity recognition.

Text Normalization（文本正则化）

Paddle2.x version AI-Writer