🔤 Measure edit distance based on keyboard layout

Related tags

Miscellaneousclavier
Overview

clavier

Measure edit distance based on keyboard layout.



Table of contents

Introduction

Default edit distances, such as the Levenshtein distance, don't differentiate between characters. The distance between two characters is either 0 or 1. This package allows you to measure edit distances by taking into account keyboard layouts.

The scope is purposefully limited to alphabetical, numeric, and punctuation keys. That's because this package is meant to assist in analyzing user inputs -- e.g. for spelling correction in a search engine.

The goal of this package is to be flexible. You can define any logical layout, such as QWERTY or AZERTY. You can also control the physical layout by defining where the keys are on the board.

Installation

pip install git+https://github.com/MaxHalford/clavier

User guide

Keyboard layouts

☝️ Things are a bit more complicated than QWERTY vs. AZERTY vs. XXXXXX. Each layout has many variants. I haven't yet figured out a comprehensive way to map all these out.

This package provides a list of keyboard layouts. For instance, we'll load the QWERTY keyboard layout.

>>> import clavier
>>> keyboard = clavier.load_qwerty()
>>> keyboard
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /

>>> keyboard.shape
(4, 13)

>>> len(keyboard)
46

Here is the list of currently available layouts:

>>> for layout in (member for member in dir(clavier) if member.startswith('load_')):
...     print(layout.replace('load_', ''))
...     exec(f'print(clavier.{layout}())')
...     print('---')
dvorak
` 1 2 3 4 5 6 7 8 9 0 [ ]
' , . p y f g c r l / = \
a o e u i d h t n s -
; q j k x b m w v z
---
qwerty
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
---

Distance between characters

Measure the Euclidean distance between two characters on the keyboard.

>>> keyboard.char_distance('1', '2')
1.0

>>> keyboard.char_distance('q', '2')
1.4142135623730951

>>> keyboard.char_distance('1', 'm')
6.708203932499369

Distance between words

Measure a modified version of the Levenshtein distance, where the substitution cost is the output of the char_distance method.

>>> keyboard.word_distance('apple', 'wople')
2.414213562373095

>>> keyboard.word_distance('apple', 'woplee')
3.414213562373095

You can also override the deletion cost by specifying the deletion_cost parameter, and the insertion cost via the insertion_cost parameter. Both default to 1.

Typing distance

Measure the sum of distances between each pair of consecutive characters. This can be useful for studying keystroke dynamics.

>>> keyboard.typing_distance('hello')
10.245040190466598

For sentences, you can split them up into words and sum the typing distances.

>>> sentence = 'the quick brown fox jumps over the lazy dog'
>>> sum(keyboard.typing_distance(word) for word in sentence.split(' '))
105.60457487263012

Interestingly, this can be used to compare keyboard layouts in terms of efficiency. For instance, the Dvorak keyboard layout is supposedly more efficient than the QWERTY layout. Let's compare both on the first stanza of If— by Rudyard Kipling:

>> words = list(map(str.lower, stanza.split())) >>> qwerty = clavier.load_qwerty() >>> sum(qwerty.typing_distance(word) for word in words) 740.3255229138255 >>> dvorak = clavier.load_dvorak() >>> sum(dvorak.typing_distance(word) for word in words) 923.6597116104518 ">
>>> stanza = """
... If you can keep your head when all about you
...    Are losing theirs and blaming it on you;
... If you can trust yourself when all men doubt you,
...    But make allowance for their doubting too;
... If you can wait and not be tired by waiting,
...    Or, being lied about, don't deal in lies,
... Or, being hated, don't give way to hating,
...    And yet don't look too good, nor talk too wise;
... """

>>> words = list(map(str.lower, stanza.split()))

>>> qwerty = clavier.load_qwerty()
>>> sum(qwerty.typing_distance(word) for word in words)
740.3255229138255

>>> dvorak = clavier.load_dvorak()
>>> sum(dvorak.typing_distance(word) for word in words)
923.6597116104518

It seems the Dvorak layout is in fact slower than the QWERTY layout. But of course this might not be the case in general.

Nearest neighbors

You can iterate over the k nearest neighbors of any character.

>>> qwerty = clavier.load_qwerty()
>>> for char, dist in qwerty.nearest_neighbors('s', k=8, cache=True):
...     print(char, f'{dist:.4f}')
w 1.0000
a 1.0000
d 1.0000
x 1.0000
q 1.4142
e 1.4142
z 1.4142
c 1.4142

The cache parameter determines whether or not the result should be cached for the next call.

Physical layout specification

By default, the keyboard layouts are ortholinear, meaning that the characters are physically arranged over a grid. You can customize the physical layout to make it more realistic and thus obtain distance measures which are closer to reality. This can be done by specifying parameters to the keyboards when they're loaded.

Staggering

Staggering is the amount of offset between two consecutive keyboard rows.

You can specify a constant staggering as so:

>>> keyboard = clavier.load_qwerty(staggering=0.5)

By default the keys are spaced by 1 unit. So a staggering value of 0.5 implies a 50% horizontal shift between each pair of consecutive rows. You may also specify a different amount of staggering for each pair of rows:

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])

There's 3 elements in the list because the keyboard has 4 rows.

Key pitch

Key pitch is the amount of distance between the centers of two adjacent keys. Most computer keyboards have identical horizontal and vertical pitches, because the keys are all of the same size width and height. But this isn't the case for mobile phone keyboards. For instance, iPhone keyboards have a higher vertical pitch.

Drawing a keyboard layout

>>> keyboard = clavier.load_qwerty()
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty.png', bbox_inches='tight')

qwerty

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty_staggered.png', bbox_inches='tight')

qwerty_staggered

Custom layouts

You can of course specify your own keyboard layout. There are different ways to do this. We'll use the iPhone keypad as an example.

The from_coordinates method

>>> keypad = clavier.Keyboard.from_coordinates({
...     '1': (0, 0), '2': (0, 1), '3': (0, 2),
...     '4': (1, 0), '5': (1, 1), '6': (1, 2),
...     '7': (2, 0), '8': (2, 1), '9': (2, 2),
...     '*': (3, 0), '0': (3, 1), '#': (3, 2),
...                  '☎': (4, 1)
... })
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

The from_grid method

>> keypad 1 2 3 4 5 6 7 8 9 * 0 # ☎ ">
>>> keypad = clavier.Keyboard.from_grid("""
...     1 2 3
...     4 5 6
...     7 8 9
...     * 0 #
...       ☎
... """)
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

Development

git clone https://github.com/MaxHalford/clavier
cd clavier
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

Owner
Max Halford
Going where the wind blows 🍃 🦔
Max Halford
JSEngine is a simple wrapper of Javascript engines.

JSEngine This is a simple wrapper of Javascript engines, it wraps the Javascript interpreter for Python use. There are two ways to call interpreters,

11 Dec 18, 2022
A (hopefully) considerably copious collection of classical cipher crackers

ClassicalCipherCracker A (hopefully) considerably copious collection of classical cipher crackers Written in Python3 (and run with PyPy) TODOs Write a

Stanley Zhong 2 Feb 22, 2022
Sigma coding youtube - This is a collection of all the code that can be found on my YouTube channel Sigma Coding.

Sigma Coding Tutorials & Resources YouTube • Facebook Support Sigma Coding Patreon • GitHub Sponsor • Shop Amazon Table of Contents Overview Topics Re

Alex Reed 927 Jan 08, 2023
Provides guideline on how to configure pre-commit hooks in your own python project

Pre-commit Configuration Guide The main aim of this repository is to act as a guide on how to configure the pre-commit hooks in your existing python p

Faraz Ahmed Khan 2 Mar 31, 2022
A simple countdown timer in eazy code to show timer with python

Countdown_Timer The simple CLI countdown timer in eazy code to show timer How Work First you fill the input by int-- (Enter the time in Seconds:) for

Yasin Rezvani 3 Nov 15, 2022
A basic layout of atm working of my local database

Software for working Banking service 😄 This project was developed for Banking service. mysql server is required To have mysql server on your system u

satya 1 Oct 21, 2021
Artificial intelligence based on 5-dimensional quantum selection

Deep Thought An artificial intelligence based on 5-dimensional quantum selection. Algorithm The payload Make an random bit array (e.g. 1101...) Conver

Larry Holst 3 Dec 14, 2022
Pipenv-local-deps-repro - Reproduction of a local transitive dependency on pipenv

Reproduction of the pipenv bug with transitive local dependencies. Clone this re

Lucas Duailibe 2 Jan 11, 2022
A minimalist starknet amm adapted from StarkWare's amm.

viscus • A minimalist starknet amm adapted from StarkWare's amm. Directory Structure contracts

Alucard 4 Dec 27, 2021
A example project's description is a high-level overview of why you’re doing a project.

A example project's description is a high-level overview of why you’re doing a project.

Nikita Matyukhin 12 Mar 23, 2022
Extended functionality for Namebase past their web UI

Namebase Extended Extended functionality for Namebase past their web UI.

RunDavidMC 12 Sep 02, 2022
FantasyBballHelper - Espn Fantasy Basketball Helper

ESPN FANTASY BASKETBALL HELPER The simple goal of this project is to allow fanta

1 Jan 18, 2022
Apache Superset out of box version(Windows 64-bit)

superset_app Apache Superset out of box version (Windows 64bit) prepare job download 3 files python-3.8.10-embed-amd64.zip get-pip.py python_geohash‑0

Steven Lee 9 Oct 02, 2022
Programa que organiza pastas automaticamente

📂 Folder Organizer 📂 Programa que organiza pastas automaticamente Requisitos • Como usar • Melhorias futuras • Capturas de Tela Requisitos Antes de

João Victor Vilela dos Santos 1 Nov 02, 2021
Repositório para estudo do airflow

airflow-101 Repositório para estudo do airflow Docker criado baseado no tutorial Exemplo de API da pokeapi Para executar clone o repo execute as confi

Gabriel (Gabu) Bellon 1 Nov 23, 2021
Ronin - Create Fud Meterpreter Payload To Hack Windows 11

Ronin - Create Fud Meterpreter Payload To Hack Windows 11

Dj4w3d H4mm4di 6 May 09, 2022
Hopefully it'll become a very annoying desktop pet

AnnoyingPet Basic Tutorial: https://seebass22.github.io/python-desktop-pet-tutorial/ Handling Mouse Input: https://pythonhosted.org/pynput/mouse.html

1 Jun 08, 2022
RCCで開催する『バックエンド勉強会』の資料

RCC バックエンド勉強会 開発環境 Python 3.9 Pipenv 使い方 1. インストール pipenv install 2. アプリケーションを起動 pipenv run start 本コマンドを実行するとlocalhost:8000へアクセスできるようになります。 3. テストを実行

Averak 7 Nov 14, 2021
Yet another Python Implementation of the Elo rating system.

Python Implementation - Elo Rating System Yet another Python Implementation of the Elo rating system (how innovative am I right?). Only supports 1vs1

Kraktoos 5 Dec 22, 2022
Application launcher and environment management

Application launcher and environment management for 21st century games and digital post-production, built with bleeding-rez and Qt.py News Date Releas

10 Nov 03, 2022