🎐 a python library for doing approximate and phonetic matching of strings.

Overview

jellyfish

https://coveralls.io/repos/jamesturk/jellyfish/badge.png?branch=master Documentation Status

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1
>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox
Owner
James Turk
Principal Architect of @openstates. Formerly of @OpenPrecincts, @pbs, @sunlightlabs.
James Turk
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

Vibhor Agarwal 2 Jun 30, 2022
This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

Sachit Yadav 1 Feb 11, 2022
Py65 65816 - Add support for the 65C816 to py65

Add support for the 65C816 to py65 Py65 (https://github.com/mnaberez/py65) is a

4 Jan 04, 2023
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT This is the code base for the paper DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. Code in this repository is also available

Castorini 132 Nov 14, 2022
Mapping a variable-length sentence to a fixed-length vector using BERT model

Are you looking for X-as-service? Try the Cloud-Native Neural Search Framework for Any Kind of Data bert-as-service Using BERT model as a sentence enc

Han Xiao 11.1k Jan 01, 2023
Korean Sentence Embedding Repository

Korean-Sentence-Embedding 🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides

80 Jan 02, 2023
[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

β—₯ Curriculum Labeling β—£ Revisiting Pseudo-Labeling for Semi-Supervised Learning Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez. In the

UVA Computer Vision 113 Dec 15, 2022
Open solution to the Toxic Comment Classification Challenge

Starter code: Kaggle Toxic Comment Classification Challenge More competitions πŸŽ‡ Check collection of public projects 🎁 , where you can find multiple

minerva.ml 153 Jun 22, 2022
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

Ceyda Cinarel 22 Nov 16, 2022
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

Voicegain 3 Dec 14, 2022
η«―εˆ°η«―ηš„ι•Ώζœ¬ζ–‡ζ‘˜θ¦ζ¨‘εž‹οΌˆζ³•η ”ζ―2020εΈζ³•ζ‘˜θ¦θ΅›ι“οΌ‰

η«―εˆ°η«―ηš„ι•Ώζ–‡ζœ¬ζ‘˜θ¦ζ¨‘εž‹οΌˆζ³•η ”ζ―2020εΈζ³•ζ‘˜θ¦θ΅›ι“οΌ‰

θ‹ε‰‘ζž—(Jianlin Su) 334 Jan 08, 2023
Official PyTorch implementation of SegFormer

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

NVIDIA Research Projects 1.4k Dec 29, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
Nmt - TensorFlow Neural Machine Translation Tutorial

Neural Machine Translation (seq2seq) Tutorial Authors: Thang Luong, Eugene Brevdo, Rui Zhao (Google Research Blogpost, Github) This version of the tut

6.1k Dec 29, 2022
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transform

Bytedance Inc. 2.5k Jan 03, 2023
pyupbit 라이브러리λ₯Ό ν™œμš©ν•˜μ—¬ upbitμ—μ„œ λΉ„νŠΈμ½”μΈμ„ μžλ™λ§€λ§€ν•˜λŠ” μ½”λ“œμž…λ‹ˆλ‹€. μ‘°μ½”λ”© 유튜브 μ±„λ„μ—μ„œ μžμ„Έν•œ κ°•μ˜ μ˜μƒμ„ 보싀 수 μžˆμŠ΅λ‹ˆλ‹€.

파이썬 λΉ„νŠΈμ½”μΈ 투자 μžλ™ν™” κ°•μ˜ μ½”λ“œ by 유튜브 μ‘°μ½”λ”© 채널 pyupbit 라이브러리λ₯Ό ν™œμš©ν•˜μ—¬ upbit κ±°λž˜μ†Œμ—μ„œ λΉ„νŠΈμ½”μΈ μžλ™λ§€λ§€λ₯Ό ν•˜λŠ” μ½”λ“œμž…λ‹ˆλ‹€. 파일 ꡬ성 test.py : μž”κ³  쑰회 (1κ°•) backtest.py : λ°±ν…ŒμŠ€νŒ… μ½”λ“œ (2κ°•) bestK.p

μ‘°μ½”λ”© JoCoding 186 Dec 29, 2022
[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

New Benchmarks for Learning on Non-Homophilous Graphs Here are the codes and datasets accompanying the paper: New Benchmarks for Learning on Non-Homop

94 Dec 21, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023