Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

A web app via which users can buy and sell stocks using virtual money

finance Virtual Stock Trader. A web app via which users can buy and sell stocks using virtual money. All stock prices are real and provided by IEX. Fe

Kiron Deb 0 Jan 15, 2022
Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Based on python-aria-mirror-bot

- [ MAYBE UPDATE & ADD MORE MODULE ] Bagas Mirror&Leech Bot Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring file

4 Nov 23, 2021
Build a better understanding of your data in PostgreSQL.

Data Fluent for PostgreSQL Build a better understanding of your data in PostgreSQL. The following shows an example report generated by this tool. It g

Mark Litwintschik 28 Aug 30, 2022
Ghost-toolbox - Ghost's official Discord raid tool

Ghost Toolbox Ghost's official Discord raid tool. How to use Clone this repo int

G H Ø S T 10 Oct 31, 2022
Video Bot: an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

Video Bot is an Advanced Telegram Bot that's allow you to play Video & Music on

5 Jan 26, 2022
Unofficial Python wrapper for official Hacker News API

haxor Unofficial Python wrapper for official Hacker News API. Installation pip install haxor Usage Import and initialization: from hackernews import H

147 Sep 18, 2022
Python bindings for LibreTranslate

Python bindings for LibreTranslate

Argos Open Tech 42 Jan 03, 2023
The system to host your files on the Discord application

Distorage The system to host your files on the Discord application Documentation Documentation Distorage How to use the package You can install it wit

6 Jun 27, 2022
Role Based Access Control for Slack-Bolt Applications

Role Based Access Control for Slack-Bolt Apps Role Based Access Control (RBAC) is a term applied to limiting the authorization for a specific operatio

Jeremy Schulman 7 Jan 06, 2022
Bancos de Dados Relacionais (SQL) na AWS com Amazon RDS

Bancos de Dados Relacionais (SQL) na AWS com Amazon RDS Repositório para o Live Coding DIO do dia 24/11/2021 Serviços utilizados Amazon RDS AWS Lambda

Cassiano Ricardo de Oliveira Peres 4 Jul 30, 2022
This repo contains a small project i've done using PILLOW module in python

This repo contains a small project i've done using PILLOW module in python. I wrote an automated script which generates more than 5k+ unique nfts with 0 hassle in less time.

SasiVatsal 11 Nov 05, 2022
Repositório para meu Discord Bot pessoal

BassetinhoBot Escrevi o código usando o Python 3.8.3 e até agora não tive problemas rodando nas versões mais recentes. Repositório para o Discord Bot

Vinícius Bassete 1 Jan 04, 2022
AI-El-Yazisini-Tanima - Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı

AI-El Yazısını Tanıma Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı Amaç : Birden fazla makine öğrenmesi modelini bir arada kullan

Özgür Tokay 3 Mar 02, 2022
Pincer-ext-commands - A simple, lightweight package for pincer prefixed commands

pincer.ext.commands A reimagining of pincer's command system and bot system. Ins

Vincent 2 Jan 11, 2022
Google scholar share - Simple python script to pull Google Scholar data from an author's profile

google_scholar_share Simple python script to pull Google Scholar data from an au

Paul Goldsmith-Pinkham 9 Sep 15, 2022
gnosis safe tx builder

Ape Safe: Gnosis Safe tx builder Ape Safe allows you to iteratively build complex multi-step Gnosis Safe transactions and safely preview their side ef

228 Dec 22, 2022
Spore API wrapper written in Python

A wrapper for the Spore API that simplifies and complements its functionality

1 Nov 25, 2021
Wechat-file-cleaner - Clean files in PC WeChat FileStorage directory

Wechat-file-cleaner - Clean files in PC WeChat FileStorage directory

Xingjian Zhang 1 Feb 06, 2022
Telegram File Renamer Bot

RENAMER_BOT Telegram File Renamer Bot Configs TG_BOT_TOKEN - Get bot token from @BotFather API_ID - From my.telegram.org API_HASH - From my.telegram.o

Lntechnical 37 Dec 27, 2022