🔤 Measure edit distance based on keyboard layout

Related tags

Miscellaneousclavier
Overview

clavier

Measure edit distance based on keyboard layout.



Table of contents

Introduction

Default edit distances, such as the Levenshtein distance, don't differentiate between characters. The distance between two characters is either 0 or 1. This package allows you to measure edit distances by taking into account keyboard layouts.

The scope is purposefully limited to alphabetical, numeric, and punctuation keys. That's because this package is meant to assist in analyzing user inputs -- e.g. for spelling correction in a search engine.

The goal of this package is to be flexible. You can define any logical layout, such as QWERTY or AZERTY. You can also control the physical layout by defining where the keys are on the board.

Installation

pip install git+https://github.com/MaxHalford/clavier

User guide

Keyboard layouts

☝️ Things are a bit more complicated than QWERTY vs. AZERTY vs. XXXXXX. Each layout has many variants. I haven't yet figured out a comprehensive way to map all these out.

This package provides a list of keyboard layouts. For instance, we'll load the QWERTY keyboard layout.

>>> import clavier
>>> keyboard = clavier.load_qwerty()
>>> keyboard
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /

>>> keyboard.shape
(4, 13)

>>> len(keyboard)
46

Here is the list of currently available layouts:

>>> for layout in (member for member in dir(clavier) if member.startswith('load_')):
...     print(layout.replace('load_', ''))
...     exec(f'print(clavier.{layout}())')
...     print('---')
dvorak
` 1 2 3 4 5 6 7 8 9 0 [ ]
' , . p y f g c r l / = \
a o e u i d h t n s -
; q j k x b m w v z
---
qwerty
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
---

Distance between characters

Measure the Euclidean distance between two characters on the keyboard.

>>> keyboard.char_distance('1', '2')
1.0

>>> keyboard.char_distance('q', '2')
1.4142135623730951

>>> keyboard.char_distance('1', 'm')
6.708203932499369

Distance between words

Measure a modified version of the Levenshtein distance, where the substitution cost is the output of the char_distance method.

>>> keyboard.word_distance('apple', 'wople')
2.414213562373095

>>> keyboard.word_distance('apple', 'woplee')
3.414213562373095

You can also override the deletion cost by specifying the deletion_cost parameter, and the insertion cost via the insertion_cost parameter. Both default to 1.

Typing distance

Measure the sum of distances between each pair of consecutive characters. This can be useful for studying keystroke dynamics.

>>> keyboard.typing_distance('hello')
10.245040190466598

For sentences, you can split them up into words and sum the typing distances.

>>> sentence = 'the quick brown fox jumps over the lazy dog'
>>> sum(keyboard.typing_distance(word) for word in sentence.split(' '))
105.60457487263012

Interestingly, this can be used to compare keyboard layouts in terms of efficiency. For instance, the Dvorak keyboard layout is supposedly more efficient than the QWERTY layout. Let's compare both on the first stanza of If— by Rudyard Kipling:

>> words = list(map(str.lower, stanza.split())) >>> qwerty = clavier.load_qwerty() >>> sum(qwerty.typing_distance(word) for word in words) 740.3255229138255 >>> dvorak = clavier.load_dvorak() >>> sum(dvorak.typing_distance(word) for word in words) 923.6597116104518 ">
>>> stanza = """
... If you can keep your head when all about you
...    Are losing theirs and blaming it on you;
... If you can trust yourself when all men doubt you,
...    But make allowance for their doubting too;
... If you can wait and not be tired by waiting,
...    Or, being lied about, don't deal in lies,
... Or, being hated, don't give way to hating,
...    And yet don't look too good, nor talk too wise;
... """

>>> words = list(map(str.lower, stanza.split()))

>>> qwerty = clavier.load_qwerty()
>>> sum(qwerty.typing_distance(word) for word in words)
740.3255229138255

>>> dvorak = clavier.load_dvorak()
>>> sum(dvorak.typing_distance(word) for word in words)
923.6597116104518

It seems the Dvorak layout is in fact slower than the QWERTY layout. But of course this might not be the case in general.

Nearest neighbors

You can iterate over the k nearest neighbors of any character.

>>> qwerty = clavier.load_qwerty()
>>> for char, dist in qwerty.nearest_neighbors('s', k=8, cache=True):
...     print(char, f'{dist:.4f}')
w 1.0000
a 1.0000
d 1.0000
x 1.0000
q 1.4142
e 1.4142
z 1.4142
c 1.4142

The cache parameter determines whether or not the result should be cached for the next call.

Physical layout specification

By default, the keyboard layouts are ortholinear, meaning that the characters are physically arranged over a grid. You can customize the physical layout to make it more realistic and thus obtain distance measures which are closer to reality. This can be done by specifying parameters to the keyboards when they're loaded.

Staggering

Staggering is the amount of offset between two consecutive keyboard rows.

You can specify a constant staggering as so:

>>> keyboard = clavier.load_qwerty(staggering=0.5)

By default the keys are spaced by 1 unit. So a staggering value of 0.5 implies a 50% horizontal shift between each pair of consecutive rows. You may also specify a different amount of staggering for each pair of rows:

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])

There's 3 elements in the list because the keyboard has 4 rows.

Key pitch

Key pitch is the amount of distance between the centers of two adjacent keys. Most computer keyboards have identical horizontal and vertical pitches, because the keys are all of the same size width and height. But this isn't the case for mobile phone keyboards. For instance, iPhone keyboards have a higher vertical pitch.

Drawing a keyboard layout

>>> keyboard = clavier.load_qwerty()
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty.png', bbox_inches='tight')

qwerty

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty_staggered.png', bbox_inches='tight')

qwerty_staggered

Custom layouts

You can of course specify your own keyboard layout. There are different ways to do this. We'll use the iPhone keypad as an example.

The from_coordinates method

>>> keypad = clavier.Keyboard.from_coordinates({
...     '1': (0, 0), '2': (0, 1), '3': (0, 2),
...     '4': (1, 0), '5': (1, 1), '6': (1, 2),
...     '7': (2, 0), '8': (2, 1), '9': (2, 2),
...     '*': (3, 0), '0': (3, 1), '#': (3, 2),
...                  '☎': (4, 1)
... })
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

The from_grid method

>> keypad 1 2 3 4 5 6 7 8 9 * 0 # ☎ ">
>>> keypad = clavier.Keyboard.from_grid("""
...     1 2 3
...     4 5 6
...     7 8 9
...     * 0 #
...       ☎
... """)
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

Development

git clone https://github.com/MaxHalford/clavier
cd clavier
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

Owner
Max Halford
Going where the wind blows 🍃 🦔
Max Halford
This is the Quiz that I made using Python Programming Language. This can only run in the Terminal

This is the Quiz that I made using Python Programming Language. This can only run in the Terminal

YOSHITHA RATHNAYAKE 1 Apr 08, 2022
Timetable scripts for python

Timetable Scripts timetable_to_json: https://beta.elektronplus.pl/timetable classes_taught_by_teacher: a.adam (aa) ['1Tc', '1Td', '3Te', '3Ti', '4Tf',

Elektron++ 2 Jan 02, 2022
Python programming language Test

Exercise You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirem

Monirul Islam Khan 1 Dec 13, 2021
Stop ask your soraka to ult you, just ult yourself

Lollo's auto-ultimate script Are you tired of your low elo friend who can't ult you with soraka when you ask for it? Use Useless Support and just ult

9 Oct 20, 2022
Open source tools to allow working with ESP devices in the browser

ESP Web Tools Allow flashing ESPHome or other ESP-based firmwares via the browser. Will automatically detect the board type and select a supported fir

ESPHome 195 Dec 31, 2022
Python most simple|stupid programming language (MSPL)

Most Simple|Stupid Programming language. (MSPL) Stack - Based programming language "written in Python" Features: Interpretate code (Run). Generate gra

Kirill Zhosul 14 Nov 03, 2022
A continuation Of Project Glow By @glowstik-yt

Project Glow Greetings, I see you have stumbled upon project glow. Project glow is an open source bot worked on by many people to create a good and sa

1 Nov 17, 2021
contextlib2 is a backport of the standard library's contextlib module to earlier Python versions.

contextlib2 is a backport of the standard library's contextlib module to earlier Python versions. It also sometimes serves as a real world proving gro

Jazzband 35 Dec 23, 2022
Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)

Fluent Python, First Edition: example code This repository is archived and will not be updated.

Fluent Python 5.4k Jan 09, 2023
This repo contains scripts that add functionality to xbar.

xbar-custom-plugins This repo contains scripts that add functionality to xbar. Usage You have to add scripts to xbar plugin folder. If you don't find

osman uygar 1 Jan 10, 2022
Statically typed BNF with semantic actions; A frontend of frontend frameworks; Use your grammar everywhere.

Statically typed BNF with semantic actions; A frontend of frontend frameworks; Use your grammar everywhere.

Taine Zhao 56 Dec 14, 2022
A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.

DeltaCAT DeltaCAT is a Pythonic Data Catalog powered by Ray. Its data storage model allows you to define and manage fast, scalable, ACID-compliant dat

45 Oct 15, 2022
Script to change official Kali repository to mirrors

Script to change official Kali repository to mirrors. This helps increase packages update and downloading for some user.

Vineet Bhavsar 2 Nov 29, 2021
An OrpheusDL Tidal module

OrpheusDL - Tidal A Tidal module for the OrpheusDL modular archival music program Report Bug · Request Feature Table of content About OrpheusDL - Tida

Daniel 54 Dec 29, 2022
The most widely used Python to C compiler

Welcome to Cython! Cython is a language that makes writing C extensions for Python as easy as Python itself. Cython is based on Pyrex, but supports mo

7.6k Jan 03, 2023
Notebooks for computing approximations to the prime counting function using Riemann's formula.

Notebooks for computing approximations to the prime counting function using Riemann's formula.

Tom White 2 Aug 02, 2022
Attempt at a Windows version of the plotman Chia Plot Manager system

windows plotman: an attempt to get plotman to work on windows THIS IS A BETA. Not ready for production use just yet. Almost, but not quite there yet.

59 May 11, 2022
aaencode for python,把python代码转换为颜文字

py-aaencode aaencode for python,把python代码转换为颜文字 compile.py: 将python编译成颜文字,编译结果有随机性,可以选择BPE词表压缩代码 compile_min.py: 最小化的编译器 compiled_min.txt: 编译得到的最小的com

11 Dec 30, 2021
Cute study buddy that helps you study with the Pomodoro technique!

study-buddy Cute study buddy that helps you study with the Pomodoro (or Animedoro) technique! Kirby The Kirby folder has a Kirby, pink-themed Pomodoro

Ethan Emmanuel 1 Jan 19, 2022
Module to align code with thoughts of users and designers. Also magically handles navigation and permissions.

This readme will introduce you to Carteblanche and walk you through an example app, please refer to carteblanche-django-starter for the full example p

Eric Neuman 42 May 28, 2021