Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Last update: Jan 05, 2023

Related tags

Overview

Regression Transformer

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Development setup

conda env create -f conda.yml
conda activate terminator
pip install -e .

Generate some data

Example data for QED can be generated using scripts/generate_example_data.py.

python scripts/generate_example_data.py examples/example.smi examples/qed_property_example.txt

If you need to create a new vocabulary for a dataset you can use scripts/create_vocabulary.py it will also automatically add some special tokens at the top of your vocabulary file.

python scripts/create_vocabulary.py examples/qed_property_example.txt examples/vocab.txt

At this point the folder containing the vocabulary file can be used to load a tokenizer compatible with any ExpressionBertTokenizer:

>>> from terminator.tokenization import ExpressionBertTokenizer
>>> tokenizer = ExpressionBertTokenizer.from_pretrained('examples')
>>> text = '
   
    0.3936|CBr'
   
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['
   
    '
   , '_0_0_', '_._', '_3_-1_', '_9_-2_', '_3_-3_', '_6_-4_', '|', 'C', 'Br']
>>> token_indexes = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))
>>> print(token_indexes)
[16, 17, 18, 28, 45, 34, 35, 19, 15, 63]
>>> tokenizer.build_inputs_with_special_tokens(token_indexes)
[12, 16, 17, 18, 28, 45, 34, 35, 19, 15, 63, 13]

Prepare some train/eval data line by line:

head -n 900 examples/qed_property_example.txt > examples/train.txt
tail -n +901 examples/qed_property_example.txt > examples/eval.txt

Launch the training:

python scripts/run_language_modeling.py --output_dir examples/models/xlnet_selfies \
    --config_name configs/xlnet_selfies.json --tokenizer_name ./examples/vocab.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 5 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_data_file ./examples/eval.txt \
    --train_data_file ./examples/train.txt --line_by_line --block_size 510 --seed 42 --logging_steps 250

Exemplary model configurations (number of heads, layers, etc.) can be found in the configs folder.

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Related tags

Overview

Regression Transformer

Development setup

Generate some data

Owner

International Business Machines

Official implementation of paper Gradient Matching for Domain Generalization

Code for the paper Learning the Predictability of the Future

NeuralCompression is a Python repository dedicated to research of neural networks that compress data

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

JAX + dataclasses

ThunderSVM: A Fast SVM Library on GPUs and CPUs

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

Benchmark for evaluating open-ended generation

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

EEGEyeNet is benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Robust Partial Matching for Person Search in the Wild

This repository gives an example on how to preprocess the data of the HECKTOR challenge

Taming Transformers for High-Resolution Image Synthesis

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

A script helps the user to update Linux and Mac systems through the terminal

A project to make Amazon Echo respond to sign language using your webcam

Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc