The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Last update: Dec 16, 2022

Related tags

Text Processing python-Levenshtein

Overview

Contents

Maintainer wanted
Introduction
Installation
Documentation
License
History
Source code
Authors

Maintainer wanted

I am looking for a new maintainer to the project as it is apparent that I haven't had the need for this particular library for well over 7 years now, due to it being a C-only library and its somewhat restrictive original license.

Introduction

The Levenshtein Python C extension module contains functions for fast computation of

Levenshtein (edit) distance, and edit operations
string similarity
approximate median strings, and generally string averaging
string sequence and set similarity

It supports both normal and Unicode strings.

Python 2.2 or newer is required; Python 3 is supported.

StringMatcher.py is an example SequenceMatcher-like class built on the top of Levenshtein. It misses some SequenceMatcher's functionality, and has some extra OTOH.

Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension. No separate docs are provided yet, RTFS. But they are not interchangeable:

C functions exported when compiling with -DNO_PYTHON (see Levenshtein.h) are not exported when compiling as a Python extension (and vice versa)
Unicode character type used with -DNO_PYTHON is wchar_t, Python extension uses Py_UNICODE, they may be the same but don't count on it

Installation

pip install python-Levenshtein

Documentation

Documentation for the current version

gendoc.sh generates HTML API documentation, you probably want a selfcontained instead of includable version, so run in ./gendoc.sh --selfcontained. It needs Levenshtein already installed and genextdoc.py.

http://github.com/ztane/python-Levenshtein/

Authors

Maintainer: Antti Haapala <[email protected]>
Python 3 compatibility: Esa Määttä
Jonatas CD: Fixed documentation generation
Previous maintainer: Mikko Ohtamaa
Original code: David Necas (Yeti) <yeti at physics.muni.cz>

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Related tags

Overview

Maintainer wanted

Introduction

Installation

Documentation

License

History

Source code

Authors

Owner

Antti Haapala

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

Redlines produces a Markdown text showing the differences between two strings/text

This is a text summarizing tool written in Python

This script has been created in order to find what are the most common demanded technologies in Data Engineering field.

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Python Lex-Yacc

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

JSON and CSV data for Swahili dictionary with over 16600+ words

Maiden & Spell community player ranking based on tournament data.

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

A neat little program to read the text from the "All Ten Fingers" program, and write them back.

Search for terms(word / table / field name or any) under Snowflake schema names

Hamming code generation, error detection & correction.

A minimal python script for generating multiple onetime use bip39 seed phrases

Vastasanuli - Vastasanuli pelaa Sanuli-peliä.

Microsoft's Cascadia Code font customized to my liking.