Utility for Text Normalisation or Inverse Normalisation

Overview

Text Processor

Text Normalisation or Inverse Normalisation for Indonesian, e.g.

measurements

  • "123 kg" -> "seratus dua puluh tiga kilogram"

Currency/Money

  • "Harga notebook itu $250 atau sekitar Rp 3,000,000" -> "Harga notebook itu dua ratus lima puluh dollar atau sekitar tiga juta rupiah"

Dates

  • "Pecurian itu terjadi hari minggu kemarin (12/3/2021)" -> "Pecurian itu terjadi hari minggu kemarin dua belas Maret dua ribu dua puluh satu "

Time Zone

  • "Kebakaran terjadi di wilayah jakarta sejak pukul 22.30 WITA di 1.000 rumah" -> "Kebakaran terjadi di wilayah jakarta sejak pukul dua puluh dua lewat tiga puluh menit Waktu Indonesia Tengah di seribu rumah"

Usage

from text_processor import TextProcessor

tp = TextProcessor()
print(tp.normalize("123 kg"))
Owner
Cahya Wirawan
System engineer, currently working on NLP, CV and Speech Recognition for fun and curiosity
Cahya Wirawan
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
TextStatistics - Get a text file wich contains English text

TextStatistics This program get a text file wich contains English text. The program analyses the text, and print some information. For this program I

2 Nov 15, 2021
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

Niklas 29 Dec 20, 2022
Microsoft's Cascadia Code font customized to my liking.

Microsoft's Cascadia Code font customized to my liking. Also includes some simple batch patch and bake scripts to batch patch glyphs and bake font features into fonts!

Frederik List 3 Jan 29, 2022
Repository containing the code for An-Gocair text normaliser

Scottish Gaelic Text Normaliser The following project contains the code and resources for the Scottish Gaelic text normalisation project. The repo can

3 Jun 28, 2022
Convert ebooks with few clicks on Telegram!

E-Book Converter Bot A bot that converts e-books to various formats, powered by calibre! It currently supports 34 input formats and 19 output formats.

Youssif Shaaban Alsager 45 Jan 05, 2023
Question answering on russian with XLMRobertaLarge as a service

QA Roberta Ru SaaS Question answering on russian with XLMRobertaLarge as a service. Thanks for the model to Alexander Kaigorodov. Stack Flask Gunicorn

Gladkikh Prohor 21 Jul 04, 2022
Word and phrase lists in CSV

Word Lists Word and phrase lists in CSV, collected from different sources. Oxford Word Lists: oxford-5k.csv - Oxford 3000 and 5000 oxford-opal.csv - O

Anton Zhiyanov 14 Oct 14, 2022
This project is a small tool for processing url-containing texts delivered by HUAWEI Share on Windows.

hwshare_helper This project is a small tool for handling url-containing texts delivered by HUAWEI Share on Windows. config Before use, please install

1 Jan 19, 2022
JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Jordan Kalebu 8 Jan 13, 2022
Getting git-style versioning working on RDFlib

Getting git-style versioning working on RDFlib

Gabe Fierro 1 Feb 01, 2022
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
基于Pytex的数学建模工具,实现将md文件转换成pdf/tex文档的前后端

Pytex-for-MCM 基于Pytex的数学建模工具,实现将md文件转换成pdf/tex文档的前后端。

3 May 17, 2021
Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Kevin Lai 1 Nov 08, 2021
The bot creates hashtags for user's texts in Russian and English.

telegram_bot_hashtags The bot creates hashtags for user's texts in Russian and English. It is a simple bot for creating hashtags. NOTE file config.py

Yana Davydovich 2 Feb 12, 2022
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
Converts a Bangla numeric string to literal words.

Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage

Syed Mostofa Monsur 3 Aug 29, 2022
Auto translate Localizable.strings for multiple languages in Xcode

auto_localize Auto translate Localizable.strings for multiple languages in Xcode Usage put your origin Localizable.strings file in folder pip3 install

Wesley Zhang 13 Nov 22, 2022
Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

Val Neekman 1.3k Jan 04, 2023
LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code. LazyText is for text what lazypredict is for numeric data.

Jay Vala 13 Nov 04, 2022