Trained T5 and T5-large model for creating keywords from text

Last update: Nov 24, 2022

Overview

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

pip install transformers sentencepiece

from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X	Y
Some text that is fed to the input	The text that should come out
Some text that is fed to the input	The text that should come out

Go to the training notebook and learn more about it:

Trained T5 and T5-large model for creating keywords from text

Related tags

Overview

text to keywords

Usage

Training

Owner

Danil

Finetune gpt-2 in google colab

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

SimCTG - A Contrastive Framework for Neural Text Generation

Turkish Stop Words Türkçe Dolgu Sözcükleri

PG-19 Language Modelling Benchmark

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

💫 Industrial-strength Natural Language Processing (NLP) in Python

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

The RWKV Language Model

MicBot - MicBot uses Google Translate to speak everyone's chat messages

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

A library for finding knowledge neurons in pretrained transformer models.

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Transformer related optimization, including BERT, GPT

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.