A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Last update: Dec 19, 2022

Overview

Styleformer

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more.For instance, understand What makes text formal or casual/informal.

Usecases for Styleformer
Installation
Quick Start
- Casual to Formal (Available now !)
Models
Dataset
Benchmark
References
Citation

Usecases for Styleformer

Area 1: Data Augmentation

Augment training datasets with various fine-grained language styles.

Area 2: Post-processing

Apply style transfers to machine generated text.
e.g.
- Refine a Summarised text to active voice + formal tone.
- Refine a Translated text to more casual tone to reach younger audience.

Area 3: Controlled paraphrasing

Formal <=> Casual and Active <=> style transfers adds a notion of control over how we paraphrase when compared to free-form paraphrase where there is control or guarantee over the paraphrases.

Area 4: Assisted writing

Integrate this to any human writing interfaces like email clients, messaging tools or social media post authoring tools. Your creativity is your limit to te uses.
e.g.
- Polish an email with business tone for professional uses.

Installation

pip install git+https://github.com/PrithivirajDamodaran/Styleformer.git

Quick Start

Casual to Formal (Available now !)

from styleformer import Styleformer
import torch
import warnings
warnings.filterwarnings("ignore")

'''
#uncomment for re-producability
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1234)
'''

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 0) 

source_sentences = [
"I am quitting my job",
"Jimmy is on crack and can't trust him",
"What do guys do to show that they like a gal?",
"i loooooooooooooooooooooooove going to the movies.",
"That movie was fucking awesome",
"My mom is doing fine",
"That was funny LOL" , 
"It's piece of cake, we can do it",
"btw - ur avatar looks familiar",
"who gives a crap?",
"Howdy Lucy! been ages since we last met.",
"Dude, this car's dope!",
"She's my bestie from college",
"I kinda have a feeling that he has a crush on you.",
"OMG! It's finger-lickin' good.",
]   

for source_sentence in source_sentences:
    target_sentence = sf.transfer(source_sentence)
    print("-" *100)
    print("[Informal] ", source_sentence)
    print("-" *100)
    if target_sentence is not None:
        print("[Formal] ",target_sentence)
        print()
    else:
        print("No good quality transfers available !")

[Informal]  I am quitting my job
[Formal]  I will be stepping down from my job.
----------------------------------------------------------------------------------------------------
[Informal]  Jimmy is on crack and can't trust him
[Formal]  Jimmy is a crack addict I cannot trust him
----------------------------------------------------------------------------------------------------
[Informal]  What do guys do to show that they like a gal?
[Formal]  What do guys do to demonstrate their affinity for women?
----------------------------------------------------------------------------------------------------
[Informal]  i loooooooooooooooooooooooove going to the movies.
[Formal]  I really like to go to the movies.
----------------------------------------------------------------------------------------------------
[Informal]  That movie was fucking awesome
[Formal]  That movie was wonderful.
----------------------------------------------------------------------------------------------------
[Informal]  My mom is doing fine
[Formal]  My mother is doing well.
----------------------------------------------------------------------------------------------------
[Informal]  That was funny LOL
[Formal]  That was hilarious
----------------------------------------------------------------------------------------------------
[Informal]  It's piece of cake, we can do it
[Formal]  The whole process is simple and is possible.
----------------------------------------------------------------------------------------------------
[Informal]  btw - ur avatar looks familiar
[Formal]  Also, your avatar looks familiar.
----------------------------------------------------------------------------------------------------
[Informal]  who gives a crap?
[Formal]  Who cares?
----------------------------------------------------------------------------------------------------
[Informal]  Howdy Lucy! been ages since we last met.
[Formal]  Hello, Lucy It has been a long time since we last met.
----------------------------------------------------------------------------------------------------
[Informal]  Dude, this car's dope!
[Formal]  I find this car very appealing.
----------------------------------------------------------------------------------------------------
[Informal]  She's my bestie from college
[Formal]  She is my best friend from college.
----------------------------------------------------------------------------------------------------
[Informal]  I kinda have a feeling that he has a crush on you.
[Formal]  I have a feeling that he is attracted to you.
----------------------------------------------------------------------------------------------------
[Informal]  OMG! It's finger-lickin' good.
[Formal]  It is so good, it is delicious.
----------------------------------------------------------------------------------------------------

Knobs

# inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
target_sentence = sf.transfer(source_sentence, inference_on=0, quality_filter=0.95, max_candidates=5)

Models

Model	Type	Status
prithivida/informal_to_formal_styletransfer	Seq2Seq	Beta
prithivida/formal_to_informal_styletransfer	Seq2Seq	WIP
prithivida/active_to_passive_styletransfer	Seq2Seq	WIP
prithivida/passive_to_active_styletransfer	Seq2Seq	WIP
prithivida/positive_to_negative_styletransfer	Seq2Seq	WIP
prithivida/negative_to_positive_styletransfer	Seq2Seq	WIP

Dataset

TBD
Fined tuned on T5 on a Tesla T4 GPU and it took ~2 hours to train each of the above models with batch_size = 16 and epochs = 5.(Will share training args shortly)

Benchmark

References

Citation

Comments

added streamlit app
Following points are covered in this PR:

Added Streamlit app. (CTF,FTC,ATP,PTA)

Fixed bug in PTA style transfer

@PrithivirajDamodaran Attaching screenshot of streamlit app for reference. Let me know your suggestions
opened by shashankdeshpande 6

Trimming long sentences

Following the code snippet for a better understanding of the problem, I am facing.

from styleformer import Styleformer
import torch
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 0) 

source_sentences = [
                   "Corruption in african countries hinders economic, political and social development. It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable. In addition, corruption affects the lives of individuals, families and communities. The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption."
]

for paragraph in source_sentences:
    # sentences = sent_tokenize(paragraph)
    sentences = paragraph.split('. ')
    for source_sentence in sentences:
        target_sentence = sf.transfer(source_sentence)
        print("-" *100)
        print("[Casual] ", source_sentence)
        print("-" *100)
        if target_sentence is not None:
            print("[Formal] ",target_sentence)
            print()
        else:
            print("No good quality transfers available !")

Program Output

[Casual] Corruption in african countries hinders economic, political and social development

[Formal] In African countries, corruption affects economic, political, and social development.

[Casual] It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable

[Formal] It's a major obstacle to economic growth, good governance, and fundamental freedoms, such as the freedom of speech or the right of citizens to

[Casual] In addition, corruption affects the lives of individuals, families and communities

[Formal] Additionally, corruption has a negative impact on individuals, families and communities.

[Casual] The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption.

[Formal] The tenth Global Corruptibility Barometer (GCB) program - in Africa - shows that while many people in Africa feel that corruption

Please help to fix this for longer sentences. Thanks in advance!

wontfix

opened by Nomiluks 4

OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that....
Hi Prithviraj,

Fantastic work you are doing.

While testing your models, I intended to deploy the model wrapped in a flask app on EC2.

Although the results work on Google Colab, I receive the following error on EC2 -

OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that: - 'prithivida/parrot_adequacy_on_BART' is a correct model identifier listed on 'https://huggingface.co/models' - or 'prithivida/parrot_adequacy_on_BART' is the correct path to a directory containing a config.json file

Can you guide me on how this can be resolved?

Regards, Paritosh
invalid
opened by katreparitosh 2

Issue with loading saved models

Hi, I'm trying to save and load the tokenizer and model. I use the following to save them:

tokenizer = AutoTokenizer.from_pretrained("prithivida/informal_to_formal_styletransfer")
tokenizer.save_pretrained('./data/style_tokenizer')
model = AutoModelForSeq2SeqLM.from_pretrained("prithivida/informal_to_formal_styletransfer")
model.save_pretrained('./data/style_model')

But when I try to load them, from the local path, I get the following error:

OSError: Can't load config for '../data/style_tokenizer'. Make sure that:

- '../data/style_tokenizer' is a correct model identifier listed on 'https://huggingface.co/models'

- or '../data/style_tokenizer' is the correct path to a directory containing a config.json file

This somehow makes sense since saving the vectorizer, no config.json is being created.

Any idea how can I save/load the tokenizer and model?

opened by Naviden 1

Code to train the model

Hey, Can you please share the code, where you train models? We have tasks similar to issues you solve but in other domains. It might be very helpful for us. Do you fine-tune only T5 or you make additional changes to T5 fine-tuning? Thanks
question

opened by ivan-bulka 1
cant create a Styleformer(style=n)

it keeps throing the same error(diffrent request id) OSError: There was a specific connection error when trying to load prithivida/informal_to_formal_styletransfer: <class 'requests.exceptions.HTTPError'> (Request ID: K9-6-Ks5uMEai7cOcQ3gC)

opened by TalSchiff 0
Unable to create the styleformer instance

OSError: prithivida/parrot_adequacy_on_BART is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

I'm using the latest version and seeing the following issue. I was wondering if anything has changed on the huggingface models front?

opened by ks2002119 0

How to do inferencing using multiple GPU's for styleformer

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)

opened by pratikchhapolika 6

Sentiment Transfer

Love the library!

Was hoping to do sentiment transfer but I see that has not yet been integrated. Any pointers towards off the shelf models that can do that?

opened by JosephGatto 1

Releases(v1.0)

v1.0(Jun 21, 2021)

Model upgraded CTF Added support for FTC ATP PTA
Source code(tar.gz)
Source code(zip)
v0.2(Jun 15, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Prithivida

Applied NLP, XAI for NLP and Data Engineering

GitHub Repository

Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

49 Dec 30, 2022

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE: Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction ***** New March 31th, 2022: Scikit-Style API for Easy Usage *****

111 Dec 23, 2022

硕士期间自学的NLP子任务，供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务，供学习参考任务1 ：短文本分类 (1).数据集：THUCNews中文文本数据集(10分类) (2).模型：BERT+FC/LSTM，Pytorch实现 (3).使用方法：预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

43 Dec 23, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

RecipeReduce: Simplified Recipe Processing for Lazy Programmers

RecipeReduce This repo will help you figure out the amount of ingredients to buy for a certain number of meals with selected recipes. RecipeReduce Get

9 Apr 22, 2022

Transformers implementation for Fall 2021 Clinic

Installation Download miniconda3 if not already installed You can check by running typing conda in command prompt. Use conda to create an environment

1 Oct 28, 2021

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

53 Mar 01, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

Takker - 一个普通的QQ机器人此项目为基于 Nonebot2 和 go-cqhttp 开发，以 Sqlite 作为数据库的QQ群娱乐机器人关于纯兴趣开发，部分功能借鉴了大佬们的代码，作为Q群的娱乐+功能性Bot 声明此项目仅用于学习交流，请勿用于非法用途这是开发者的第一个Pytho

79 Dec 29, 2022

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

Understanding the Difficulty of Training Transformers

Admin Understanding the Difficulty of Training Transformers Guided by our analyses, we propose Adaptive Model Initialization (Admin), which successful

300 Dec 29, 2022

Pattern Matching in Python

Pattern Matching finalmente chega no Python 3.10. E daí? "Pattern matching", ou "correspondência de padrões" como é conhecido no Brasil. Algumas pesso

6 Feb 16, 2022

Speach Recognitions

easy_meeting Добро пожаловать в интерфейс сервиса автопротоколирования совещаний Easy Meeting. Website - http://cf5c-62-192-251-83.ngrok.io/ Принципиа

3 Feb 18, 2022

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and

14 Dec 15, 2021

An implementation of the Pay Attention when Required transformer

Pay Attention when Required (PAR) Transformer-XL An implementation of the Pay Attention when Required transformer from the paper: https://arxiv.org/pd

7 Aug 11, 2022

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

105 Jan 04, 2023

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Well-formed Limericks and Haikus with GPT2 📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation In collaboration with Matthew Korahais &

2 May 26, 2022

test

Lidar-data-decode In this project, you can decode your lidar data frame(pcap file) and make your own datasets(test dataset) in Windows without any hug

46 Dec 05, 2022