A Python library for generating new text from existing samples.

Last update: May 17, 2022

Related tags

Overview

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birthday messages, horoscopes, Wikipedia articles, or the utterances of your game's NPCs. Everything works without an omnipotent "AI" - it is dead-simple code and therefore fast.

Check out the examples and feel free to contribute!

Installation

pip3 install remarkov

Example

Scrape the Wikipedia page for "Computer Programming" and generate a new text from it:

./tools/scrape-wiki.py Computer_programming | remarkov build | remarkov generate

You can also use remarkov programmatically:

from remarkov import create_model

model = create_model()
model.add_text("This is a sample text and this is another.")

print(model.generate().text())
# "This is a sample text and this is a sample text and this is a sample text ..."

Development

Make sure you run pytest as module. This will add the current directory to the import path:

python3 -m pytest

This project uses black for source code formatting:

black .

Generate documentation for the project (this uses the original pdoc at pdoc.dev):

git checkout gh-pages
pdoc -t pdoc/template -o public/docs <path_to_remarkov_module>

Run type checks using mypy:

mypy -p remarkov

Publishing is done like this (don't forget to bump the version in setup.py):

pip3 install twine # optional

git tag -a <version>
git push --tags

python3 setup.py clean --all
python3 setup.py sdist bdist_wheel
twine check "dist/*"
twine upload "dist/*"

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

Comments

Release schedule
[x] Add source code documentation

[x] Improve explanation on website

[x] Adapt syntax highlighting in docs

[x] Generate samples for showcase

[x] Articles

[x] Birthday

[x] Horoscope

[x] Utterance

[x] Enable gh-pages
opened by lausek 0

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)
ReMarkov Example Datasets - EN

Based on:

https://github.com/kavgan/OpinRank (Cars, Hotels)

https://github.com/dsnam/markovscope (Horoscopes)

https://github.com/hmi-utwente/video-game-text-corpora (NPC)

ReMarkov Wikipedia Scraper (Blockchain)

Source code(tar.gz)
Source code(zip)
remarkov-dataset.7z(6.16 MB)
remarkov-dataset.zip(9.05 MB)

A Python library for generating new text from existing samples.

Related tags

Overview

Installation

Example

Development

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Comments

Release schedule

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)

ReMarkov Example Datasets - EN

Owner

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Face recognition. Redefined.

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss （ATVGnet）

Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

Official repository of the paper 'Essentials for Class Incremental Learning'

TensorFlow 2 AI/ML library wrapper for openFrameworks

Python implementation of "Elliptic Fourier Features of a Closed Contour"

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

natural image generation using ConvNets

Implementing DeepMind's Fast Reinforcement Learning paper

Answer a series of contextually-dependent questions like they may occur in natural human-to-human conversations.

An ML & Correlation platform for transforming disparate data points of interest into usable intelligence.

Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Torchreid: Deep learning person re-identification in PyTorch.

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

QuALITY: Question Answering with Long Input Texts, Yes!

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving