Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Last update: Jan 03, 2023

Overview

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

Comments

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none) ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?
opened by abhijithneilabraham 6
Add finetuning model to keytotext

Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus
enhancement good first issue

opened by gagan3012 2
"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

opened by drscotthawley 2
Add Citations

Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by gagan3012 1
Adding new models to keytotext

Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.
enhancement good first issue

opened by gagan3012 1
Inference API for Keytotext

Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

Describe the solution you'd like Inference API
enhancement good first issue

opened by gagan3012 1
Create Better UI

Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

Describe the solution you'd like Better UI with a nicer design
enhancement

opened by gagan3012 1
Add `st.cache` to load model

Hi @gagan3012,

Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

Hope this works for you and let me know if you have any other questions! 🎈

Cheers, Johannes

opened by jrieke 1
ValueError: transformers.models.auto.__spec__ is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

opened by varunakk 0
Update README.md
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Update trainer.py
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

opened by skintflickz 0
New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'
I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade !sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

Have attached error screenshot

OS: Windows

Browser Chrome
opened by aishwaryapisal9 2
Update trainer.py
Delete progress_bar_refresh_rate in trainer.py

Description

delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

Motivation and Context

having this argument fails the training process

How Has This Been Tested?

Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

Screenshots (if appropriate):

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by anath2110benten 0
Why is cv2 required?

https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

opened by ChunxuYang 0
Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by RuiFeiHe 6

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

Trainer tool finalized and completed!
Source code(tar.gz)
Source code(zip)
v1.4.1(Jul 2, 2021)

Val acc added
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 2, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v1.3.8(Jul 2, 2021)

New Upload to hf hub module
Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 16, 2021)

Documentation updated along with sematic versioning
Source code(tar.gz)
Source code(zip)

v0.3.1(Jun 15, 2021)

This version features a tested trainer which can be used in 4 lines of code:

from keytotext import KeytotextTrainer

model = KeytotextTrainer()
model.from_pretrained(model_name="t5-small")
model.train(data_df=df,batch_size=4, max_epochs=3, use_gpu=True)
model.save_model()

Source code(tar.gz)
Source code(zip)

v0.2.9(Jun 15, 2021)

This release features the new Trainer module More details coming soon
Source code(tar.gz)
Source code(zip)
v0.2.5(May 12, 2021)
Changes:

Bug Fixes

Maintaining new models

Source code(tar.gz)
Source code(zip)
v0.2.4(May 11, 2021)
Changes:

Refactoring of code

Ability to add new models too

Source code(tar.gz)
Source code(zip)
v0.2.3(May 10, 2021)
v0.2.3 :

Bug fixes

New models added

Source code(tar.gz)
Source code(zip)
v0.2.2(May 10, 2021)
Changes:

Now keytotext supports new models trained by other people too

A new fine-tuning script

Source code(tar.gz)
Source code(zip)
v0.2.1(May 5, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v0.2.0(May 4, 2021)
Latest Release:

Completed API

Completed testing

completed all Evals

UI Improvements too

Source code(tar.gz)
Source code(zip)
v0.1.6(May 2, 2021)
Changes:

Updates to Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.5(May 2, 2021)
Changes:

Added Trainer API

Added Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 30, 2021)

Latest release
Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 27, 2021)

Updates
Source code(tar.gz)
Source code(zip)
0.1.1(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 26, 2021)

Production release- 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Gagan Bhatia

Software Developer | Machine Learning Enthusiast

GitHub Repository https://share.streamlit.io/gagan3012/keytotext/UI/app.py

Asr abc - Automatic speech recognition(ASR),中文语音识别

语音识别的简单示例,主要在课堂演示使用创建python虚拟环境在linux 和macos 上验证通过 # 如果已经有pyhon3.6 环境，跳过该步骤，使用

8 Nov 11, 2022

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

pkuseg：一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用，支持细分领域分词，有效提升了分词准确度。目录主要亮点编译和安装各类分词工具包的性能对比使用方式论文引用作者常见问题及解答主要

6k Dec 29, 2022

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

1 Jun 05, 2022

NL. The natural language programming language.

NL A Natural-Language programming language. Built using Codex. A few examples are inside the nl_projects directory. How it works Write any code in pur

2 Jan 17, 2022

ACL'2021: Learning Dense Representations of Phrases at Scale

DensePhrases DensePhrases is an extractive phrase search tool based on your natural language inputs. From 5 million Wikipedia articles, it can search

540 Dec 30, 2022

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

6k Dec 30, 2022

Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

2 Apr 24, 2022

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

7 Mar 12, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

22 Dec 29, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Related tags

Overview

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

Comments

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

v1.4.1(Jul 2, 2021)

v1.3.9(Jul 2, 2021)

v1.3.8(Jul 2, 2021)

v1.3.1(Jun 16, 2021)

v0.3.1(Jun 15, 2021)

v0.2.9(Jun 15, 2021)

v0.2.5(May 12, 2021)

v0.2.4(May 11, 2021)

v0.2.3(May 10, 2021)

v0.2.2(May 10, 2021)

v0.2.1(May 5, 2021)

v0.2.0(May 4, 2021)

v0.1.6(May 2, 2021)

v0.1.5(May 2, 2021)

v0.1.4(Apr 30, 2021)

v0.1.3(Apr 27, 2021)

0.1.1(Apr 26, 2021)

0.1.0(Apr 26, 2021)

Owner

Gagan Bhatia

Asr abc - Automatic speech recognition(ASR),中文语音识别

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

NL. The natural language programming language.

ACL'2021: Learning Dense Representations of Phrases at Scale

An open source library for deep learning end-to-end dialog systems and chatbots.

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

用Resnet101+GPT搭建一个玩王者荣耀的AI

PIZZA - a task-oriented semantic parsing dataset

Blender addon - Scrub timeline from viewport with a shortcut

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Creating an LSTM model to generate music

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

TLA - Twitter Linguistic Analysis

LCG T-TEST USING EUCLIDEAN METHOD

Machine learning classifiers to predict American Sign Language .

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Twitter Sentiment Analysis using #tag, words and username