A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Overview

PyPI - License Visits Badge

Styleformer

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more.For instance, understand What makes text formal or casual/informal.

Table of contents

Usecases for Styleformer

Area 1: Data Augmentation

  • Augment training datasets with various fine-grained language styles.

Area 2: Post-processing

  • Apply style transfers to machine generated text.
  • e.g.
    • Refine a Summarised text to active voice + formal tone.
    • Refine a Translated text to more casual tone to reach younger audience.

Area 3: Controlled paraphrasing

  • Formal <=> Casual and Active <=> style transfers adds a notion of control over how we paraphrase when compared to free-form paraphrase where there is control or guarantee over the paraphrases.

Area 4: Assisted writing

  • Integrate this to any human writing interfaces like email clients, messaging tools or social media post authoring tools. Your creativity is your limit to te uses.
  • e.g.
    • Polish an email with business tone for professional uses.

Installation

pip install git+https://github.com/PrithivirajDamodaran/Styleformer.git

Quick Start

Casual to Formal (Available now !)

from styleformer import Styleformer
import torch
import warnings
warnings.filterwarnings("ignore")

'''
#uncomment for re-producability
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1234)
'''

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 0) 

source_sentences = [
"I am quitting my job",
"Jimmy is on crack and can't trust him",
"What do guys do to show that they like a gal?",
"i loooooooooooooooooooooooove going to the movies.",
"That movie was fucking awesome",
"My mom is doing fine",
"That was funny LOL" , 
"It's piece of cake, we can do it",
"btw - ur avatar looks familiar",
"who gives a crap?",
"Howdy Lucy! been ages since we last met.",
"Dude, this car's dope!",
"She's my bestie from college",
"I kinda have a feeling that he has a crush on you.",
"OMG! It's finger-lickin' good.",
]   

for source_sentence in source_sentences:
    target_sentence = sf.transfer(source_sentence)
    print("-" *100)
    print("[Informal] ", source_sentence)
    print("-" *100)
    if target_sentence is not None:
        print("[Formal] ",target_sentence)
        print()
    else:
        print("No good quality transfers available !")
[Informal]  I am quitting my job
[Formal]  I will be stepping down from my job.
----------------------------------------------------------------------------------------------------
[Informal]  Jimmy is on crack and can't trust him
[Formal]  Jimmy is a crack addict I cannot trust him
----------------------------------------------------------------------------------------------------
[Informal]  What do guys do to show that they like a gal?
[Formal]  What do guys do to demonstrate their affinity for women?
----------------------------------------------------------------------------------------------------
[Informal]  i loooooooooooooooooooooooove going to the movies.
[Formal]  I really like to go to the movies.
----------------------------------------------------------------------------------------------------
[Informal]  That movie was fucking awesome
[Formal]  That movie was wonderful.
----------------------------------------------------------------------------------------------------
[Informal]  My mom is doing fine
[Formal]  My mother is doing well.
----------------------------------------------------------------------------------------------------
[Informal]  That was funny LOL
[Formal]  That was hilarious
----------------------------------------------------------------------------------------------------
[Informal]  It's piece of cake, we can do it
[Formal]  The whole process is simple and is possible.
----------------------------------------------------------------------------------------------------
[Informal]  btw - ur avatar looks familiar
[Formal]  Also, your avatar looks familiar.
----------------------------------------------------------------------------------------------------
[Informal]  who gives a crap?
[Formal]  Who cares?
----------------------------------------------------------------------------------------------------
[Informal]  Howdy Lucy! been ages since we last met.
[Formal]  Hello, Lucy It has been a long time since we last met.
----------------------------------------------------------------------------------------------------
[Informal]  Dude, this car's dope!
[Formal]  I find this car very appealing.
----------------------------------------------------------------------------------------------------
[Informal]  She's my bestie from college
[Formal]  She is my best friend from college.
----------------------------------------------------------------------------------------------------
[Informal]  I kinda have a feeling that he has a crush on you.
[Formal]  I have a feeling that he is attracted to you.
----------------------------------------------------------------------------------------------------
[Informal]  OMG! It's finger-lickin' good.
[Formal]  It is so good, it is delicious.
----------------------------------------------------------------------------------------------------

Knobs

# inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
target_sentence = sf.transfer(source_sentence, inference_on=0, quality_filter=0.95, max_candidates=5)

Models

Model Type Status
prithivida/informal_to_formal_styletransfer Seq2Seq Beta
prithivida/formal_to_informal_styletransfer Seq2Seq WIP
prithivida/active_to_passive_styletransfer Seq2Seq WIP
prithivida/passive_to_active_styletransfer Seq2Seq WIP
prithivida/positive_to_negative_styletransfer Seq2Seq WIP
prithivida/negative_to_positive_styletransfer Seq2Seq WIP

Dataset

  • TBD
  • Fined tuned on T5 on a Tesla T4 GPU and it took ~2 hours to train each of the above models with batch_size = 16 and epochs = 5.(Will share training args shortly)

Benchmark

  • TBD

References

Citation

  • TBD
Comments
  • added streamlit app

    added streamlit app

    Following points are covered in this PR:

    • Added Streamlit app. (CTF,FTC,ATP,PTA)
    • Fixed bug in PTA style transfer

    @PrithivirajDamodaran Attaching screenshot of streamlit app for reference. Let me know your suggestions

    app_screenshot

    opened by shashankdeshpande 6
  • Trimming long sentences

    Trimming long sentences

    Following the code snippet for a better understanding of the problem, I am facing.

    from styleformer import Styleformer
    import torch
    import warnings
    warnings.filterwarnings("ignore")
    
    # style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
    sf = Styleformer(style = 0) 
    
    source_sentences = [
                       "Corruption in african countries hinders economic, political and social development. It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable. In addition, corruption affects the lives of individuals, families and communities. The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption."
    ]
    
    for paragraph in source_sentences:
        # sentences = sent_tokenize(paragraph)
        sentences = paragraph.split('. ')
        for source_sentence in sentences:
            target_sentence = sf.transfer(source_sentence)
            print("-" *100)
            print("[Casual] ", source_sentence)
            print("-" *100)
            if target_sentence is not None:
                print("[Formal] ",target_sentence)
                print()
            else:
                print("No good quality transfers available !")
    

    Program Output


    [Casual] Corruption in african countries hinders economic, political and social development

    [Formal] In African countries, corruption affects economic, political, and social development.


    [Casual] It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable

    [Formal] It's a major obstacle to economic growth, good governance, and fundamental freedoms, such as the freedom of speech or the right of citizens to


    [Casual] In addition, corruption affects the lives of individuals, families and communities

    [Formal] Additionally, corruption has a negative impact on individuals, families and communities.


    [Casual] The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption.

    [Formal] The tenth Global Corruptibility Barometer (GCB) program - in Africa - shows that while many people in Africa feel that corruption

    Please help to fix this for longer sentences. Thanks in advance!

    wontfix 
    opened by Nomiluks 4
  • OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that....

    OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that....

    Hi Prithviraj,

    Fantastic work you are doing.

    While testing your models, I intended to deploy the model wrapped in a flask app on EC2.

    Although the results work on Google Colab, I receive the following error on EC2 -

    OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that:
    
    - 'prithivida/parrot_adequacy_on_BART' is a correct model identifier listed on 'https://huggingface.co/models'
    
    - or 'prithivida/parrot_adequacy_on_BART' is the correct path to a directory containing a config.json file
    

    Can you guide me on how this can be resolved?

    Regards, Paritosh

    invalid 
    opened by katreparitosh 2
  • Issue with loading saved models

    Issue with loading saved models

    Hi, I'm trying to save and load the tokenizer and model. I use the following to save them:

    tokenizer = AutoTokenizer.from_pretrained("prithivida/informal_to_formal_styletransfer")
    tokenizer.save_pretrained('./data/style_tokenizer')
    model = AutoModelForSeq2SeqLM.from_pretrained("prithivida/informal_to_formal_styletransfer")
    model.save_pretrained('./data/style_model')
    

    But when I try to load them, from the local path, I get the following error:

    OSError: Can't load config for '../data/style_tokenizer'. Make sure that:
    
    - '../data/style_tokenizer' is a correct model identifier listed on 'https://huggingface.co/models'
    
    - or '../data/style_tokenizer' is the correct path to a directory containing a config.json file
    

    This somehow makes sense since saving the vectorizer, no config.json is being created.

    Any idea how can I save/load the tokenizer and model?

    opened by Naviden 1
  • Code to train the model

    Code to train the model

    Hey, Can you please share the code, where you train models? We have tasks similar to issues you solve but in other domains. It might be very helpful for us. Do you fine-tune only T5 or you make additional changes to T5 fine-tuning? Thanks

    question 
    opened by ivan-bulka 1
  • cant create a Styleformer(style=n)

    cant create a Styleformer(style=n)

    it keeps throing the same error(diffrent request id) OSError: There was a specific connection error when trying to load prithivida/informal_to_formal_styletransfer: <class 'requests.exceptions.HTTPError'> (Request ID: K9-6-Ks5uMEai7cOcQ3gC)

    opened by TalSchiff 0
  • Unable to create the styleformer instance

    Unable to create the styleformer instance

    OSError: prithivida/parrot_adequacy_on_BART is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

    I'm using the latest version and seeing the following issue. I was wondering if anything has changed on the huggingface models front?

    opened by ks2002119 0
  • How to do inferencing using multiple GPU's for styleformer

    How to do inferencing using multiple GPU's for styleformer

    I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

    How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

    Here is the same code I use in inference.py:

    from styleformer import Styleformer
    import warnings
    warnings.filterwarnings("ignore")
    
    # style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
    sf = Styleformer(style = 1) 
    import torch
    def set_seed(seed):
      torch.manual_seed(seed)
      if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    
    set_seed(1212)
    
    source_sentences = [
    "I would love to meet attractive men in town",
    "Please leave the room now",
    "It is a delicious icecream",
    "I am not paying this kind of money for that nonsense",
    "He is on cocaine and he cannot be trusted with this",
    "He is a very nice man and has a charming personality",
    "Let us go out for dinner",
    "We went to Barcelona for the weekend. We have a lot of things to tell you.",
    ]   
    
    for source_sentence in source_sentences:
        # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
        target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
        print("[Formal] ", source_sentence)
        if target_sentence is not None:
            print("[Casual] ",target_sentence)
        else:
            print("No good quality transfers available !")
        print("-" *100)     
    
    opened by pratikchhapolika 6
  • Sentiment Transfer

    Sentiment Transfer

    Love the library!

    Was hoping to do sentiment transfer but I see that has not yet been integrated. Any pointers towards off the shelf models that can do that?

    opened by JosephGatto 1
Releases(v1.0)
Owner
Prithivida
Applied NLP, XAI for NLP and Data Engineering
Prithivida
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 39 Dec 14, 2022
This is Assignment1 code for the Web Data Processing System.

This is a Python program to Entity Linking by processing WARC files. We recognize entities from web pages and link them to a Knowledge Base(Wikidata).

3 Dec 04, 2022
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in s

Jonas Belouadi 7 Nov 07, 2022
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloomโ€™s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
Code for Text Prior Guided Scene Text Image Super-Resolution

Code for Text Prior Guided Scene Text Image Super-Resolution

82 Dec 26, 2022
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 06, 2023
โšก boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

LineFlow: Framework-Agnostic NLP Data Loader in Python LineFlow is a simple text dataset loader for NLP deep learning tasks. LineFlow was designed to

TofuNLP 177 Jan 04, 2023
Fast topic modeling platform

The state-of-the-art platform for topic modeling. Full Documentation User Mailing List Download Releases User survey What is BigARTM? BigARTM is a pow

BigARTM 633 Dec 21, 2022
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Khalid Saifullah 37 Sep 05, 2022
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
NLP tool to extract emotional phrase from tweets ๐Ÿคฉ

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹. Releases์—์„œ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ฑฐ๋‚˜, tfds-korean์„ ํ†ตํ•ด ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ์„ธ์š”.

Namuwiki corpus ๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ฏธ๋ฆฌ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ์ฝ”ํผ์Šค. ๋ชฉ์ ์ด LM๋“ฑ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด๋ผ, ๋งํฌ/์ด๋ฏธ์ง€/ํ…Œ์ด๋ธ” ๋“ฑ๋“ฑ์ด ์ž˜๋ ค์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์žฅ ๋‹จ์œ„ ๋ถ„์ ˆ์€ kss๋ฅผ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ผ์ด์„ ์Šค๋Š” ๋‚˜๋ฌด์œ„ํ‚ค์— ๋ช…์‹œ๋œ ๋ฐ”์™€ ๊ฐ™์ด CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Deep Learning Topics with Computer Vision & NLP

Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly

Simona Mircheva 1 Jan 20, 2022
Uses Google's gTTS module to easily create robo text readin' on command.

Tool to convert text to speech, creating files for later use. TTRS uses Google's gTTS module to easily create robo text readin' on command.

0 Jun 20, 2021
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

Matheus Alves 2 Jan 06, 2022
Non-Autoregressive Predictive Coding

Non-Autoregressive Predictive Coding This repository contains the implementation of Non-Autoregressive Predictive Coding (NPC) as described in the pre

Alexander H. Liu 43 Nov 15, 2022