NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Last update: Dec 15, 2022

Related tags

Overview

NLPretext

Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web? 😫

Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post? 😥

NLPretext got you covered! 🚀

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

🔍 Quickly explore below our preprocessing pipelines and individual functions referential.

Default preprocessing pipeline
Custom preprocessing pipeline
Replacing phone numbers
Removing hashtags
Extracting emojis
Data augmentation

Cannot find what you were looking for? Feel free to open an issue.

Installation

This package has been tested on Python 3.6, 3.7 and 3.8.

We strongly advise you to do the remaining steps in a virtual environnement.

To install this library you just have to run the following command:

pip install nlpretext

This library uses Spacy as tokenizer. Current models supported are en_core_web_sm and fr_core_news_sm. If not installed, run the following commands:

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz

pip install https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.3.0/fr_core_news_sm-2.3.0.tar.gz

Preprocessing pipeline

Default pipeline

Need to preprocess your text data but no clue about what function to use and in which order? The default preprocessing pipeline got you covered:

from nlpretext import Preprocessor
text = "I just got the best dinner in my life @latourdargent !!! I  recommend 😀 #food #paris \n"
preprocessor = Preprocessor()
text = preprocessor.run(text)
print(text)
# "I just got the best dinner in my life !!! I recommend"

Create your custom pipeline

Another possibility is to create your custom pipeline if you know exactly what function to apply on your data, here's an example:

from nlpretext import Preprocessor
from nlpretext.basic.preprocess import (normalize_whitespace, remove_punct, remove_eol_characters,
remove_stopwords, lower_text)
from nlpretext.social.preprocess import remove_mentions, remove_hashtag, remove_emoji
text = "I just got the best dinner in my life @latourdargent !!! I  recommend 😀 #food #paris \n"
preprocessor = Preprocessor()
preprocessor.pipe(lower_text)
preprocessor.pipe(remove_mentions)
preprocessor.pipe(remove_hashtag)
preprocessor.pipe(remove_emoji)
preprocessor.pipe(remove_eol_characters)
preprocessor.pipe(remove_stopwords, args={'lang': 'en'})
preprocessor.pipe(remove_punct)
preprocessor.pipe(normalize_whitespace)
text = preprocessor.run(text)
print(text)
# "dinner life recommend"

Take a look at all the functions that are available here in the preprocess.py scripts in the different folders: basic, social, token.

Individual Functions

Replacing emails

from nlpretext.basic.preprocess import replace_emails
example = "I have forwarded this email to [email protected]"
example = replace_emails(example, replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"

Replacing phone numbers

from nlpretext.basic.preprocess import replace_phone_numbers
example = "My phone number is 0606060606"
example = replace_phone_numbers(example, country_to_detect=["FR"], replace_with="*PHONE*")
print(example)
# "My phone number is *PHONE*"

Removing Hashtags

from nlpretext.social.preprocess import remove_hashtag
example = "This restaurant was amazing #food #foodie #foodstagram #dinner"
example = remove_hashtag(example)
print(example)
# "This restaurant was amazing"

Extracting emojis

from nlpretext.social.preprocess import extract_emojis
example = "I take care of my skin 😀"
example = extract_emojis(example)
print(example)
# [':grinning_face:']

Data augmentation

The augmentation module helps you to generate new texts based on your given examples by modifying some words in the initial ones and to keep associated entities unchanged, if any, in the case of NER tasks. If you want words other than entities to remain unchanged, you can specify it within the stopwords argument. Modifications depend on the chosen method, the ones currently supported by the module are substitutions with synonyms using Wordnet or BERT from the nlpaug library.

from nlpretext.augmentation.text_augmentation import augment_text
example = "I want to buy a small black handbag please."
entities = [{'entity': 'Color', 'word': 'black', 'startCharIndex': 22, 'endCharIndex': 27}]
example = augment_text(example, method=”wordnet_synonym”, entities=entities)
print(example)
# "I need to buy a small black pocketbook please."

Make HTML documentation

In order to make the html Sphinx documentation, you need to run at the nlpretext root path: sphinx-apidoc -f nlpretext -o docs/ This will generate the .rst files. You can generate the doc with cd docs && make html

You can now open the file index.html located in the build folder.

Project Organization

├── LICENSE
├── VERSION
├── CONTRIBUTING.md     <- Contribution guidelines
├── README.md           <- The top-level README for developers using this project.
├── .github/workflows   <- Where the CI lives
├── datasets/external   <- Bash scripts to download external datasets
├── docs                <- Sphinx HTML documentation
├── nlpretext           <- Main Package. This is where the code lives
│   ├── preprocessor.py <- Main preprocessing script
│   ├── augmentation    <- Text augmentation script
│   ├── basic           <- Basic text preprocessing 
│   ├── social          <- Social text preprocessing
│   ├── token           <- Token text preprocessing
│   ├── _config         <- Where the configuration and constants live
│   └── _utils          <- Where preprocessing utils scripts lives
├── tests               <- Where the tests lives
├── setup.py            <- makes project pip installable (pip install -e .) so the package can be imported
├── requirements.txt    <- The requirements file for reproducing the analysis environment, e.g.
│                          generated with `pip freeze > requirements.txt`
└── pylintrc            <- The linting configuration file

Comments

Bump actions/cache from 2.1.6 to 3.2.1
Bumps actions/cache from 2.1.6 to 3.2.1.

Release notes

Sourced from actions/cache's releases.

v3.2.1

What's Changed

Release compression related changes for windows by @Phantsure in actions/cache#1039

Upgrade codeql to v2 by @Phantsure in actions/cache#1023

Full Changelog: https://github.com/actions/cache/compare/v3.2.0...v3.2.1

v3.2.0

What's Changed

fix wrong timeout env var key in README.md by @walterddr in actions/cache#959

Updated release doc with correct env variable by @kotewar in actions/cache#960

Create pull_request_template.md by @pdotl in actions/cache#963

Update README with clearer info about cache-hit and its value by @kotewar in actions/cache#961

Change datadog/squid to Ubuntu/squid in CI check by @bishal-pdMSFT in actions/cache#976

Add more details to version section in readme by @bishal-pdMSFT in actions/cache#971

Update hashFiles documentation reference by @asaf400 in actions/cache#979

Updated link for cache segment download info by @kotewar in actions/cache#986

Readme update for deleting caches by @t-dedah in actions/cache#981

Add oncall logic to assign issues and PRs by @vsvipul in actions/cache#997

Bump minimatch from 3.0.4 to 3.1.2 by @dependabot in actions/cache#998

Revert "Bump minimatch from 3.0.4 to 3.1.2" by @vsvipul in actions/cache#1005

Fix npm vulnerability by @Phantsure in actions/cache#1007

refactor: Use early return pattern to avoid nested conditions by @jongwooo in actions/cache#1013

Use cache in check-dist.yml by @jongwooo in actions/cache#1004

chore: Use built-in cache action to cache dependencies by @jongwooo in actions/cache#1014

Updated node example by @t-dedah in actions/cache#1008

Fix: Node npm doc example by @apascualm in actions/cache#1026

docs: fix an invalid link in workarounds.md by @teatimeguest in actions/cache#929

General Availability release for granular cache by @kotewar in actions/cache#1035 More details here on beta release.

New Contributors

@walterddr made their first contribution in actions/cache#959

@asaf400 made their first contribution in actions/cache#979

@jongwooo made their first contribution in actions/cache#1013

@apascualm made their first contribution in actions/cache#1026

@teatimeguest made their first contribution in actions/cache#929

Full Changelog: https://github.com/actions/cache/compare/v3...v3.2.0

v3.2.0-beta.1

What's Changed

Actions Cache Granular Control Implementation by @kotewar in actions/cache#1006

v3.1.0-beta.3

What's Changed

Bug fixes for bsdtar fallback, if gnutar not available, and gzip fallback, if cache saved using old cache action, on windows.

Full Changelog: https://github.com/actions/cache/compare/v3.1.0-beta.2...v3.1.0-beta.3

... (truncated)

Changelog

Sourced from actions/cache's changelog.

3.2.1

Update @actions/cache on windows to use gnu tar and zstd by default and fallback to bsdtar and zstd if gnu tar is not available. (issue)

Added support for fallback to gzip to restore old caches on windows.

Added logs for cache version in case of a cache miss.

Commits

c1a5de8 Upgrade codeql to v2 (#1023)

9b0be58 Release compression related changes for windows (#1039)

c17f4bf GA for granular cache (#1035)

ac25611 docs: fix an invalid link in workarounds.md (#929)

dc097e3 Update examples.md (#1026)

fb86cbf Updated node example (#1008)

a57932f Merge pull request #1014 from jongwooo/chore/use-built-in-cache-action

04b13ca chore: Use built-in cache action to cache dependencies

941bc71 Merge pull request #1004 from jongwooo/chore/use-cache-in-check-dist

08d8639 Merge branch 'main' into chore/use-cache-in-check-dist

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies github_actions
opened by dependabot[bot] 0
Bump python from 3.9.7-slim-buster to 3.11.1-slim-buster in /docker
Bumps python from 3.9.7-slim-buster to 3.11.1-slim-buster.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft docker dependencies
opened by dependabot[bot] 0
The current release is not functional as emoji lib has changed
🐛 Bug Report

🔬 How To Reproduce

Steps to reproduce the behavior:

install nlpretext from pip (1.1.0)

run from nlpretext._config import constants

Code sample

Environment

OS: macOS Silicon

Python version: 3.7, 3.8, 3.9

📈 Expected behavior

EMOJI_PATTERN = _emoji.get_emoji_regexp()

AttributeError: module 'emoji' has no attribute 'get_emoji_regexp'

bug
opened by Guillaume6606 1
Bump release-drafter/release-drafter from 5.15.0 to 5.21.1
Bumps release-drafter/release-drafter from 5.15.0 to 5.21.1.

Release notes

Sourced from release-drafter/release-drafter's releases.

v5.21.1

What's Changed

Dependency Updates

Address set-output deprecation (#1247) @NotMyFault

Full Changelog: https://github.com/release-drafter/release-drafter/compare/v5.21.0...v5.21.1

v5.21.0

What's Changed

New

fetch 100 labels for pull requests instead of 10 (#1220) @matoubidou

Full Changelog: https://github.com/release-drafter/release-drafter/compare/v5.20.1...v5.21.0

v5.20.1

What's Changed

Bug Fixes

Add missing inputs to action config (#1202) @gilbertsoft

Documentation

Add more comments about pull requests permission (#1187) @Kirade

Fix Vercel link (#1188) @shinshin86

Add permissions to README (#1132) @danyeaw

Dependency Updates

Bump eslint-plugin-unicorn from 42.0.0 to 43.0.2 (#1192) @dependabot

Bump node from af50279 to 4c8f734 (#1191) @dependabot

Bump node from 17.9.0-alpine to 18.7.0-alpine (#1190) @dependabot

Bump jest from 28.1.0 to 28.1.3 (#1182) @dependabot

Bump eslint from 8.16.0 to 8.20.0 (#1185) @dependabot

Bump nock from 13.2.4 to 13.2.9 (#1186) @dependabot

Bump probot from 12.2.4 to 12.2.5 (#1178) @dependabot

Bump eslint-plugin-prettier from 4.0.0 to 4.2.1 (#1176) @dependabot

Bump lint-staged from 13.0.0 to 13.0.3 (#1172) @dependabot

Bump prettier from 2.6.2 to 2.7.1 (#1166) @dependabot

Bump @actions/core from 1.8.2 to 1.9.0 (#1164) @dependabot

Bump lint-staged from 12.4.3 to 13.0.0 (#1156) @dependabot

Bump probot from 12.2.3 to 12.2.4 (#1155) @dependabot

Bump @vercel/ncc from 0.33.4 to 0.34.0 (#1151) @dependabot

... (truncated)

Commits

6df64e4 v5.21.1

26be07d Address set-output deprecation (#1247)

df69d58 v5.21.0

ecbbed9 fetch 100 labels for pull requests instead of 10 (#1220)

06a49bf v5.20.1

6e6a13c Add missing inputs to action config (#1202)

0e58cd4 Bump eslint-plugin-unicorn from 42.0.0 to 43.0.2 (#1192)

c3d9042 quote schema defaults that contain *

bd579b5 Bump node from af50279 to 4c8f734 (#1191)

c464263 Bump node from 17.9.0-alpine to 18.7.0-alpine (#1190)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies github_actions
opened by dependabot[bot] 0
Bump cloudpickle from 2.0.0 to 2.2.0
Bumps cloudpickle from 2.0.0 to 2.2.0.

Changelog

Sourced from cloudpickle's changelog.

2.2.0

Fix support of PyPy 3.8 and later. ([issue #455](cloudpipe/cloudpickle#455))

2.1.0

Support for pickling abc.abstractproperty, abc.abstractclassmethod, and abc.abstractstaticmethod. ([PR #450](cloudpipe/cloudpickle#450))

Support for pickling subclasses of generic classes. ([PR #448](cloudpipe/cloudpickle#448))

Support and CI configuration for Python 3.11. ([PR #467](cloudpipe/cloudpickle#467))

Support for the experimental nogil variant of CPython ([PR #470](cloudpipe/cloudpickle#470))

Commits

f31859b Release 2.2.0

23cbe15 FIX: Support PyPy > 3.7 (#480)

f5472e1 Fix for dis module is not yet available in 3.11b3 (#475)

8bbea3e compat: Import Pickler from "pickle" instead of "_pickle" (#469)

0006829 Install development version of dask in downstream tests (#472)

f926a04 Back to dev mode

d50bd11 Release 2.1.0

6a0e12d Improve compatibility with "nogil" Python and 3.11 (#470)

2fc334d Fix downstream CI (#471)

f758eb3 Fix compatibility with Python 3.11 (#467)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies python
opened by dependabot[bot] 0

Releases(1.1.0)

1.1.0(Sep 16, 2021)
What’s Changed

[FIX] Removed direct dependency and changed docker registry (#163) @Cedric-Magnan

[DOC] Updated method for spacy tokenizer installation (#159) @Cedric-Magnan

Feature/ignore stopwords (#157) @Guillaume6606

fix: display explicit error message when model not downloaded (#156) @benoitgoujon

Feature/dataloader (#152) @sachalasry-artefact

Hotfix/pylint (#151) @amaleelhamri

Fix/credits (#150) @rafaelleaygalenq

:busts_in_silhouette: List of contributors

@Cedric-Magnan, @Guillaume6606, @amaleelhamri, @benoitgoujon, @hugovasselin, @rafaelleaygalenq and @sachalasry-artefact
Source code(tar.gz)
Source code(zip)
1.0.3(Feb 18, 2021)

Update license MIT to Apache in PyPI
Source code(tar.gz)
Source code(zip)
nlpretext-1.0.2-py3-none-any.whl(131.91 KB)
nlpretext-1.0.2.tar.gz(275.42 KB)
1.0.1(Feb 18, 2021)
Readme fix

Long description add

Augmentation sphinx documentation fix

Source code(tar.gz)
Source code(zip)
nlpretext-1.0.1-py3-none-any.whl(131.90 KB)
nlpretext-1.0.1.tar.gz(275.33 KB)
1.0.0(Feb 18, 2021)
First release

Easy pipelines to clean text efficiently

Catalogue of preprocessing functions for different needs

Source code(tar.gz)
Source code(zip)
nlpretext-1.0.0-py3-none-any.whl(126.46 KB)
nlpretext-1.0.0.tar.gz(271.90 KB)

Owner

Artefact

GitHub Repository https://nlpretext.readthedocs.io/en/latest/

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

42 Dec 13, 2022

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 02, 2021

MASS: Masked Sequence to Sequence Pre-training for Language Generation

1.1k Dec 17, 2022

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

48 Nov 15, 2022

BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

61 Sep 18, 2022

A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包，通过轻量、快速的裁剪方法对模型进行结构化剪枝，从而实现压缩模型体积、提升模型速度。其他相关资源：知识蒸馏工具TextBrewer：https://github.com/airaria/TextBrewe

231 Jan 08, 2023

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

wav2vec_finetune Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks Initial test: gender recognition on this dat

8 Aug 11, 2022

State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

3.6k Jan 02, 2023

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

RIDE: ROS IDE RIDE automatically creates the package and boilerplate OOP Python code for nodes as per your needs (RIDE is not an IDE, but even ROS isn

20 Jul 14, 2022

Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

176 Nov 28, 2022

Ask for weather information like a human

weather-nlp About Ask for weather information like a human. Goals Understand typical questions like: Hourly temperatures in Potsdam on 2020-09-15. Rai

5 Oct 29, 2022

Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

27 Nov 20, 2022

SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

384 Dec 12, 2022

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

Conversational AI ChatBot Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users! In this project? Thi

6 Nov 30, 2022

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

15k Jan 02, 2023

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

[UPDATED] A TensorFlow Implementation of Attention Is All You Need When I opened this repository in 2017, there was no official code yet. I tried to i

3.8k Dec 26, 2022

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

81 Dec 09, 2022

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Related tags

Overview

NLPretext

Installation

Preprocessing pipeline

Default pipeline

Create your custom pipeline

Individual Functions

Replacing emails

Replacing phone numbers

Removing Hashtags

Extracting emojis

Data augmentation

Make HTML documentation

Project Organization

Comments

Bump actions/cache from 2.1.6 to 3.2.1

v3.2.1

What's Changed

v3.2.0

What's Changed

New Contributors

v3.2.0-beta.1

What's Changed

v3.1.0-beta.3

What's Changed

3.2.1

Bump python from 3.9.7-slim-buster to 3.11.1-slim-buster in /docker

The current release is not functional as emoji lib has changed

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

📈 Expected behavior

Bump release-drafter/release-drafter from 5.15.0 to 5.21.1

v5.21.1

What's Changed

Dependency Updates

v5.21.0

What's Changed

New

v5.20.1

What's Changed

Bug Fixes

Documentation

Dependency Updates

Bump cloudpickle from 2.0.0 to 2.2.0

2.2.0

2.1.0

Releases(1.1.0)

1.1.0(Sep 16, 2021)

What’s Changed

:busts_in_silhouette: List of contributors

1.0.3(Feb 18, 2021)

1.0.1(Feb 18, 2021)

1.0.0(Feb 18, 2021)

Owner

Artefact

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

MASS: Masked Sequence to Sequence Pre-training for Language Generation

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

BERT-based Financial Question Answering System

A PyTorch-based model pruning toolkit for pre-trained language models

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

State of the art faster Natural Language Processing in Tensorflow 2.0 .

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

Tool which allow you to detect and translate text.

Ask for weather information like a human

Shirt Bot is a discord bot which uses GPT-3 to generate text

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

SpikeX - SpaCy Pipes for Knowledge Extraction

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification