A high-level yet extensible library for fast language model tuning via automatic prompt search

Overview

ruPrompts

ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with HuggingFace Hub, configuration system powered by Hydra, and command line interface.

Prompt is a text instruction for language model, like

Translate English to French:
cat =>

For some tasks the prompt is obvious, but for some it isn't. With ruPrompts you can define only the prompt format, like {text} , and train it automatically for any task, if you have a training dataset.

You can currently use ruPrompts for text-to-text tasks, such as summarization, detoxification, style transfer, etc., and for styled text generation, as a special case of text-to-text.

Features

  • Modular structure for convenient extensibility
  • Integration with HF Transformers, support for all models with LM head
  • Integration with HF Hub for sharing and loading pretrained prompts
  • CLI and configuration system powered by Hydra
  • Pretrained prompts for ruGPT-3

Installation

ruPrompts can be installed with pip:

pip install ruprompts[hydra]

See Installation for other installation options.

Usage

Loading a pretrained prompt for styled text generation:

>> ppln_joke("Говорит кружка ложке") [{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]">
>>> import ruprompts
>>> from transformers import pipeline

>>> ppln_joke = pipeline("text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_joke")
>>> ppln_joke("Говорит кружка ложке")
[{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]

For text2text tasks:

>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли") [{"generated_text": 'Опять эти люди все испортили'}]">
>>> ppln_detox = pipeline("text2text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_detox_russe")
>>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли")
[{"generated_text": 'Опять эти люди все испортили'}]

Proceed to Quick Start for a more detailed introduction or start using ruPrompts right now with our Colab Tutorials.

License

ruPrompts is Apache 2.0 licensed. See the LICENSE file for details.

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022
Weird Sort-and-Compress Thing

Weird Sort-and-Compress Thing A weird integer sorting + compression algorithm inspired by a conversation with Luthingx (it probably already exists by

Douglas 1 Jan 03, 2022
Korea Spell Checker

한국어 문서 koSpellPy Korean Spell checker How to use Install pip install kospellpy Use from kospellpy import spell_init spell_checker = spell_init() # d

kangsukmin 2 Oct 20, 2021
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022
Multiple implementations for abstractive text summurization , using google colab

Text Summarization models if you are able to endorse me on Arxiv, i would be more than glad https://arxiv.org/auth/endorse?x=FRBB89 thanks This repo i

463 Dec 26, 2022
Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

NLP Boot Camp (Jan) Synopsis Full Name: Prameya Mohanty Name of your School: Delhi Public School, Rourkela Class: VIII Title of the Project: iTransect

TheCodingHub 1 Feb 01, 2022
Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

T-IAI-901-MSC2022 - GROUP 18 Gestion de projet Notre travail a été organisé et réparti dans un Trello. https://trello.com/b/X3s2fpPJ/ia-projet Install

1 Feb 05, 2022
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 07, 2022
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 04, 2022
Text Classification in Turkish Texts with Bert

You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification

42 Dec 31, 2022
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
2021搜狐校园文本匹配算法大赛baseline

sohu2021-baseline 2021搜狐校园文本匹配算法大赛baseline 简介 分享了一个搜狐文本匹配的baseline,主要是通过条件LayerNorm来增加模型的多样性,以实现同一模型处理不同类型的数据、形成不同输出的目的。 线下验证集F1约0.74,线上测试集F1约0.73。

苏剑林(Jianlin Su) 45 Sep 06, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

RIDE: ROS IDE RIDE automatically creates the package and boilerplate OOP Python code for nodes as per your needs (RIDE is not an IDE, but even ROS isn

Jash Mota 20 Jul 14, 2022
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022
Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Introduction The goal of this analysis is to find a model that fits the observed cumulative cases of COVID-19 in the US, starting in Mid-July 2021 and

Alexander Keeney 1 Jan 05, 2022