Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

Overview

ht-getter

Searches a document for hash tags. Supports multiple natural languages. Works in various contexts.

This package uses a non-regex approach and supports both halfwidth and fullwidth alphanumeric characters as well as various writing systems.

Installation

pip install ht-getter

Function

def get_hash_tags(source, mode = "strings")

Arguments:

source -> The source text to be searched. Must be passed as a str type.

mode -> Specifies the mode of the results. The default value is “strings”

mode = “strings” -> The results are returned as a list of strings

mode = “indices” -> The results are returned as a list of lists of the start and end indices of the hash tags.

Code Sample

from ht_getter.getter import get_hash_tags

source_text = '''This simple package helps find you find #hash_tags in various types of #documents#. It also works with other languages like #日本語 or #한국어.
It supports #fullwidth #alpha-numeric characters. You can get a #list of the #hash_tags or a list of their #indices in the #####source_text."
'''

hash_tags = get_hash_tags(source_text)
hash_tag_indices = get_hash_tags(source_text, mode = "indices")

print(hash_tags)
print(hash_tag_indices)

Things to Keep in Mind:

  1. This package can be used in various contexts. (Social media, news articles, etc.)
  2. This package looks for substrings that have the structure of a hash tag but does not check that the substring is a valid hash tag on any platform.
You might also like...
Soccerdata - Efficiently scrape soccer data from various sources
Soccerdata - Efficiently scrape soccer data from various sources

SoccerData is a collection of wrappers over soccer data from Club Elo, ESPN, FBr

Python Tool to Easily Generate Multiple Documents

Python Tool to Easily Generate Multiple Documents Running the script doesn't require internet Max Generation is set to 10k to avoid lagging/crashing R

Build documentation in multiple repos into one site.

mkdocs-multirepo-plugin Build documentation in multiple repos into one site. Setup Install plugin using pip: pip install git+https://github.com/jdoiro

Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Type hints support for the Sphinx autodoc extension

sphinx-autodoc-typehints This extension allows you to use Python 3 annotations for documenting acceptable argument types and return value types of fun

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database.

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database. This database is completely free* 💸

Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

Find target hash collisions for Apple's NeuralHash perceptual hash function.💣
Find target hash collisions for Apple's NeuralHash perceptual hash function.💣

neural-hash-collider Find target hash collisions for Apple's NeuralHash perceptual hash function. For example, starting from a picture of this cat, we

That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with
That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with

Call for translators! We're looking for translators to help translate this spec for everyone! Read this documentation in the following languages 한국어 中

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

sshuttle: where transparent proxy meets VPN meets ssh As far as I know, sshuttle is the only program that solves the following common case: Your clien

A Regex based linter tool that works for any language and works exclusively with custom linting rules.

renag Documentation Available Here Short for Regex (re) Nag (like "one who complains"). Now also PEGs (Parsing Expression Grammars) compatible with py

Code for CVPR 2021 oral paper
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Releases(v0.0.1)
Owner
Rairye
NLP (mostly tokenization, normalization, and topics concerning CJK languages) Most projects currently available in JS and Python.
Rairye
Gaphor is the simple modeling tool

Gaphor Gaphor is a UML and SysML modeling application written in Python. It is designed to be easy to use, while still being powerful. Gaphor implemen

Gaphor 1.3k Jan 03, 2023
A tutorial for people to run synthetic data replica's from source healthcare datasets

Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic

11 Mar 22, 2022
Python Deep Dive Course - Accompanying Materials

Python Deep Dive Various Jupyter notebooks and Python sources associated with my Udemy Python 3 Deep Dive course series: Part 1: Mainly functional pro

Fred Baptiste 1.1k Dec 30, 2022
MkDocs plugin for setting revision date from git per markdown file

mkdocs-git-revision-date-plugin MkDocs plugin that displays the last revision date of the current page of the documentation based on Git. The revision

Terry Zhao 48 Jan 06, 2023
Bring RGB to life in Neovim

Bring RGB to life in Neovim Change your RGB devices' color depending on Neovim's mode. Fast and asynchronous plugin to live your vim-life to the fulle

Antoine 40 Oct 27, 2022
A complete kickstart devcontainer repository for python3

A complete kickstart devcontainer repository for python3

Viktor Freiman 3 Dec 23, 2022
Collections of Beautiful Latex Snippets

HandyLatex Collections of Beautiful Latex Snippets Table 👉 Succinct table with bold separation line and gray text %################## Dependencies ##

Xintao 15 Apr 11, 2022
Watch a Sphinx directory and rebuild the documentation when a change is detected. Also includes a livereload enabled web server.

sphinx-autobuild Rebuild Sphinx documentation on changes, with live-reload in the browser. Installation sphinx-autobuild is available on PyPI. It can

Executable Books 440 Jan 06, 2023
30 Days of google cloud leaderboard website

30 Days of Cloud Leaderboard This is a leaderboard for the students of Thapar, Patiala who are participating in the 2021 30 days of Google Cloud Platf

Developer Student Clubs TIET 13 Aug 25, 2022
An introduction to hikari, complete with different examples for different command handlers.

An intro to hikari This repo provides some simple examples to get you started with hikari. Contained in this repo are bots designed with both the hika

Ethan Henderson 18 Nov 29, 2022
This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the details of Investments made, equity share, Name of investor etc.

SharkTankInvestor This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the deta

Subradip Poddar 2 Aug 02, 2022
The OpenAPI Specification Repository

The OpenAPI Specification The OpenAPI Specification is a community-driven open specification within the OpenAPI Initiative, a Linux Foundation Collabo

OpenAPI Initiative 25.5k Dec 29, 2022
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

67 Jan 04, 2023
A document format conversion service based on Pandoc.

reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L

David Lougheed 3 Jul 18, 2022
Project created to help beginner programmers to study, despite the lack of internet!

Project created to help beginner programmers to study, despite the lack of internet!

Dev4Dev 2 Oct 25, 2021
Python document object mapper (load python object from JSON and vice-versa)

lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta

Aurélien Amilin 24 Nov 09, 2022
A python package to avoid writing and maintaining duplicated python docstrings.

docstring-inheritance is a python package to avoid writing and maintaining duplicated python docstrings.

Antoine Dechaume 15 Dec 07, 2022
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.

Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:

Kazi Amit Hasan 38 Nov 24, 2022
Uses diff command to compare expected output with student's submission output

AUTOGRADER for GRADESCOPE using diff with partial grading Description: Uses diff command to compare expected output with student's submission output U

2 Jan 11, 2022
A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Knowledge Repo The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using

Airbnb 5.2k Dec 27, 2022