Yaspeller Dictionary (Auto)builder

Usage

# this sample command generates `./yaspeller_report.json`
# yaspeller --report json --ignore-digits --ignore-text "'.*" --ignore-latin --only-errors --file-extensions ".md" --lang ru

python -m venv env
source env/bin/activate
pip install 
python src/dictionary.py yaspeller_report.json

Why

Yaspeller is nice, but there are too many anglicisms in a usual documentation. Normally you just want to ignore that, but there's the only possibility to add a regexp-array to ignore words.

This generates a array of dictionary words including all lexems for all cases like

[
    "[бБ]аг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[дД]ифф(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[кК]оммит(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[пП]атчинг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[рР]убист(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[сС]амоорганизованн(ого|ом|ому|ую|ые|ый|ым|ыми|ых)",
    "[тТ]икет(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "коммитить"
]

from yaspeller errors (in text format looking like)

Spelling check:
✗ www.ruby-lang.org/ru/community/ruby-core/index.md 130 ms
-----
Typos: 9
1. патчингом (36:27)
2. коммитить (68:32, suggest: комитет)
3. багах (75:15, suggest: богах, баках, бегах)
4. баги (89:24, suggest: багги)
5. баг (96:25)
6. тикет (107:14, suggest: этикет)
7. дифф (115:18)
8. коммиту (147:24, suggest: комету, комнату)
9. коммита (148:58, suggest: комета)
-----

Live example

Initially created for www.ruby-lang.org translations spellchecking

🤕 spelling exceptions builder for lazy people

Related tags

Overview

Yaspeller Dictionary (Auto)builder

Usage

Why

Live example

Owner

Vlad Bokov

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

Continuously update some NLP practice based on different tasks.

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Big Bird: Transformers for Longer Sequences

Textlesslib - Library for Textless Spoken Language Processing

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

A workshop with several modules to help learn Feast, an open-source feature store

MPNet: Masked and Permuted Pre-training for Language Understanding

SimCTG - A Contrastive Framework for Neural Text Generation

ConvBERT-Prod

Constituency Tree Labeling Tool

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Transformers and related deep network architectures are summarized and implemented here.

AI and Machine Learning workflows on Anthos Bare Metal.

MiCECo - Misskey Custom Emoji Counter

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Python powered crossword generator with database with 20k+ polish words