Changing the Mind of Transformers for Topically-Controllable Language Generation

Overview

Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

Image of our model

Requirements and Setup

  • An Unix like OS with at least one GPU
  • To set up the python environment, run pip install -r requirements.txt. I use python 3.7 and pytorch 1.3.1, but I think other python 3 or pytorch > 1.0 versions might also be fine or just require very simple revision of the code. Our codes also use IPython notebook (for running the interactive demo), Spacy (for tokenization), nltk (for running evaluation and pplm), and gensim (for running the LDA baseline).
  • If your python path is not ~/anaconda3/bin/python, change your PY_PATH in the all the scripts in ./bin

Running IPython Notebook Demo

  • Download the pretrained models and dictionary file from here or following the instructions for training code below
  • Use IPython notebook to open ./src/evaluation/test_conditional_LM.ipynb
  • Run the 1st block after putting the models into the corresponding directory or revising the paths of TOPIC_MODEL_DIR, GENERATION_MODEL_DIR, DICT_FILE in the first block.
  • Modify the input context prompt in the 2nd block and run the block to see the generated topics
  • Choose some topics or specify some words and run the 3rd block to see the generated continuations that start with conditional x:. We will also generate the continuation without the condition that start with original x: as a baseline. The topical words that appear in the continuation will be highlighted.
  • You can append a genearted continuation to the 2nd block and repeat the process

Preprocessing Wikipedia for Training and Evaluation

  • First, download only the text from Wikipedia into json format using WikiExtractor
  • Check the path in ./bin/preprocessing_single_proc.sh and run the script. In the preprocessing, we will run Spacy tokenizer and GPT2 tokenizer, heuristically align their resulting tokens, split the corpus into training/validation/testing sets, and store the word indices into tensors.
  • Note that ./bin/preprocessing_single_proc.sh might be slow because it does not parallelize the tokenization processes. If you use job scheduler like slurm in your server, you might want to see the parallized scripts for tokenization in ./bin/old/tokenize_all_wiki_gpt2.sh and ./bin/old/tokenize_all_wiki.sh

Running Training

  • Prepare a word embedding file (e.g., we download the GloVe embedding from here)
  • Train our option generator using ./bin/train_option_generator.sh
  • Train our conditional text generator using ./bin/train_conditional_generator.sh (could train option generator and text generator at the same time)
  • You can start from original GPT2 model or start from our pretrained models. In our paper, we use learning rate = 1e-4. You can also try other values between 1e-4 and 1e-5.

Running Evaluation using Automatic Metrics

  • To evaluate/visualize conditional text generator, update the GENERATION_MODEL_DIR and TOPIC_MODEL_DIR using the model path from the previous step to run ./bin/train_conditional_generator.sh.
  • To evaluate/visualize option generator, update the GENERATION_MODEL_DIR and TOPIC_MODEL_DIR and run ./bin/eval_option_generator.sh. Set VISUALIZATION='Y' to visualize the topics given some randomly selected prompt. Set AUTO_EVAL_TOPICS='Y' to compare the quality of topics from different methods as we did in Table 1 in our EACL paper. Set AUTO_EVAL_GENRATION='Y' to evaluate the topics by the quality of text that is generated given these topics as we did in Table 6 in our paper appendix.
  • Our scores are stored at the end of each OUT_FILE file when AUTO_EVAL*='Y'. Our text generator is called "model condition", and our option generator is called NSD_topic in our code, where NSD stands for neural set decoder.
  • In our code, we also evaluate some globally clustering baselines such as LDA and kmeans. In order to test them, you can train a LDA model by following the steps here. You can also see an example code at ./src/preprocessing/tools/train_LDA_model.py. For kmeans clustering, we use ./src/preprocessing/tools/word_emb_global_clustering.py. If you do not want to test them, just remove LDA_org and global_centers from METHOD_LIST

Running Evaluation using Amazon Mechanical Turk

  • Download STSb dataset from here
  • Preprocessing STS using ./src/evaluation/filter_STS_for_GPT2.py and remove the duplication by sort sts-train_longer.csv | uniq > sts-train_longer_uniq.csv
  • Set OUTPUT_CSV_FOR_MTURK='Y' in ./bin/train_conditional_generator.sh and ./bin/eval_option_generator.sh to generate CSV files for MTurk tasks.
  • Our crowdsourcing templates and responses from workers could be found in ./MTurk_eval

Citation

If you use the code in a publication, please cite our paper.

Haw-Shiuan Chang, Jiaming Yuan, Mohit Iyyer, and Andrew McCallum,
“Changing the Mind of Transformers for Topically-Controllable Language Generation.” 
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Owner
IESL
IESL
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Tweesent-back - Tweesent backend uses fastAPI as the web framework

TweeSent Backend Tweesent backend. This repo uses fastAPI as the web framework.

0 Mar 26, 2022
Invariant Causal Prediction for Block MDPs

MISA Abstract Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challeng

Meta Research 41 Sep 17, 2022
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Score Transformer This is the official repository for "Score Transformer": Score Transformer: Generating Musical Scores from Note-level Representation

22 Dec 22, 2022
[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

[UCADI] COVID-19 Diagnosis With Federated Learning Intro We developed a Federated Learning (FL) Framework for global researchers to collaboratively tr

HUST EIC AI-LAB 30 Dec 12, 2022
Create images and texts with the First Order Generative Adversarial Networks

First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority

Zalando Research 35 Dec 11, 2021
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

75 Dec 02, 2022
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation A pytorch-version implementation codes of paper:

11 Dec 13, 2022
Kindle is an easy model build package for PyTorch.

Kindle is an easy model build package for PyTorch. Building a deep learning model became so simple that almost all model can be made by copy and paste from other existing model codes. So why code? wh

Jongkuk Lim 77 Nov 11, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral) OpenReview | arXiv | PDF | Model Zoo | BibTex PyTorch implementa

Snap Research 76 Dec 13, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 369 Dec 30, 2022
Official PyTorch implementation of "RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on" (IJCAI-ECAI 2022)

RMGN-VITON RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on In IJCAI-ECAI 2022(short oral). [Paper] [Supplementary Material] Abstra

27 Dec 01, 2022
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

Sincere 16 Dec 15, 2022
Learning Time-Critical Responses for Interactive Character Control

Learning Time-Critical Responses for Interactive Character Control Abstract This code implements the paper Learning Time-Critical Responses for Intera

Movement Research Lab 227 Dec 31, 2022
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Information Gain Filtration Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF sho

4 Jul 28, 2022