Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

Raphtory-client - The python client for the Raphtory project

Raphtory Client This is the python client for the Raphtory project Install via p

Raphtory 5 Apr 28, 2022
SOLSEA-NFT-EXPLORE - Using Streamlit to build a simple UI on top of the Solana API

SOLSEA NFT Explorer Using Streamlit to build a simple UI on top of the Solana AP

Devin Capriola 3 Mar 19, 2022
VideoMergeDcBot1 - Video Merge Dc Bot for telegram

VIDEO MERGE BOT An Telegram Bot Demo 👉 @VideoMergeDcBot To Merge multiple Video

Selfie SD 2 Feb 04, 2022
Exports saved posts and comments on Reddit to a csv file.

reddit-saved-to-csv Exports saved posts and comments on Reddit to a csv file. Columns: ID, Name, Subreddit, Type, URL, NoSFW ID: Starts from 1 and inc

70 Jan 02, 2023
Convenient script for trading with python.

Convenient script for trading with python.

VladKochetov007 66 Dec 07, 2022
A component of BuzzUtilityBot that allows for inter-server communication

A component of BuzzUtilityBot that allows for inter-server communication! Separated due to privacy and ease of inspection concerns

OHaiiBuzzle 2 Oct 11, 2022
Visual Weather api. Returns beautiful pictures with the current weather.

VWapi Visual Weather api. Returns beautiful pictures with the current weather. Installation: sudo apt update -y && sudo apt upgrade -y sudo apt instal

Hotaru 33 Nov 13, 2022
discord.js nuker (50 bans a sec)

js-nuker discord.js nuker (50 bans a sec) I was to lazy to make the scraper in js, but this works too. DISCLAIMER This is tool was made for educationa

4 Sep 11, 2021
Updated version of A discord token/password grabber thats grabs all of their tokens, passwords, credit card + alot more

Updated version of A discord token/password grabber thats grabs all of their tokens, passwords, credit card + alot more

Rdimo 556 Aug 05, 2022
Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs.

Twython Twython is a Python library providing an easy way to access Twitter data. Supports Python 3. It's been battle tested by companies, educational

Ryan McGrath 1.9k Jan 02, 2023
Tubee is a web application, which runs actions when your subscribed channel upload new videos

Tubee is a web application, which runs actions when your subscribed channel upload new videos, think of it as a better IFTTT but built specifically for YouTube with many enhancements.

Tomy Hsieh 11 Jan 01, 2023
A slack bot that notifies you when a restaurant is available for orders

Slack Wolt Notifier A Slack bot that notifies you when a Wolt restaurant or venue is available for orders. How does it work? Slack supports bots that

Gil Matok 8 Oct 24, 2022
Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Dr Asad Ali 6 Jun 20, 2022
discord.py bot written in Python.

bakerbot Bakerbot is a discord.py bot written in Python :) Originally made as a learning exercise, now used by friends as a somewhat useful bot and us

8 Dec 04, 2022
Using AWS Batch jobs to bulk copy/sync files in S3

Using AWS Batch jobs to bulk copy/sync files in S3

AWS Samples 14 Sep 19, 2022
A Discord Bot for the Pygame Community Server

PygameCommunityBot The Pygame Community Discord bot The bot is capable of doing a lot of stuff, the command prefix is pg!. For help on all the bot com

PygameCommunityDiscord 23 Nov 30, 2022
🔍 Google Search unofficial API for Python with no external dependencies

Python Google Search API Unofficial Google Search API for Python. It uses web scraping in the background and is compatible with both Python 2 and 3. W

Avi Aryan 204 Dec 28, 2022
🤖 Chegg answers requested and sent by the Discord BOT to the targeted user.

Chegg BOT Description "I believe that open-source resources are a must for everyone around. Especially in the field of education. As Chegg c

Vusal Ismayilov 33 Aug 20, 2021
A python Discord wrapper made in well, python.

discord.why A python Discord wrapper made in well, python. Made to be used by devs who want something a bit more, general. Basic Examples Sending a me

HellSec 6 Mar 26, 2022
Simple script to ban bots at Twitch chats using a text file as a source.

AUTOBAN 🇺🇸 English version Simple script to ban bots at Twitch chats using a text file as a source. How to use Windows Go to releases for further in

And Paiva 5 Feb 06, 2022