NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Overview

NumPy String-Indexed

PyPI Version Python Versions

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels, rather than conventional zero-indexing. When a friendly matrix object is initialized, labels are assigned to each array index and each dimension, and they stick to the array after NumPy-style operations such as transposing, concatenating, and aggregating. This prevents Python programmers from having to keep track mentally of what each axis and each index represents, instead making each reference to the array in code naturally self-documenting.

NumPy String-Indexed is especially useful for applications like machine learning, scientific computing, and data science, where there is heavy use of multidimensional arrays.

The friendly matrix object is implemented as a lightweight wrapper around a NumPy ndarray. It's easy to add to a new or existing project to make it easier to maintain code, and has negligible memory and performance overhead compared to the size of array (O(x + y + z) vs. O(xyz)).

Basic functionality

It's recommended to import NumPy String-Indexed idiomatically as fm:

import friendly_matrix as fm

Labels are provided during object construction and can optionally be used in place of numerical indices for slicing and indexing.

The example below shows how to construct a friendly matrix containing an image with three color channels:

image = fm.ndarray(
	numpy_ndarray_image,  # np.ndarray with shape (3, 100, 100)
	dim_names=['color_channel', 'top_to_bottom', 'left_to_right'],
	color_channel=['R', 'G', 'B'])

The matrix can then be sliced like this:

# friendly matrix with shape (100, 100)
r_channel = image(color_channel='R')

# an integer
g_top_left_pixel_value = image('G', 0, 0)

# friendly matrix with shape (2, 100, 50)
br_channel_left_half = image(
	color_channel=('B', 'R'),
	left_to_right=range(image.dim_length('left_to_right') // 2))

Documentation

Full documentation can be found here. Below is a brief overview of Friendly Matrix functionality.

Matrix operations

Friendly matrix objects can be operated on just like NumPy ndarrays with minimal overhead. The package contains separate implementations of most of the relevant NumPy ndarray operations, taking advantage of labels. For example:

side_by_side = fm.concatenate((image1, image2), axis='left_to_right')

An optimized alternative is to perform label-less operations, by adding "_A" (for "array") to the operation name:

side_by_side_arr = fm.concatenate_A((image1, image2), axis='left_to_right')

If it becomes important to optimize within a particular scope, it's recommended to shed labels before operating:

for image in huge_list:
	image_processor(image.A)

Computing matrices

A friendly matrix is an ideal structure for storing and retrieving the results of computations over multiple variables. The compute_ndarray() function executes computations over all values of the input arrays and stores them in a new Friendly Matrix ndarray instance in a single step:

'''Collect samples from a variety of normal distributions'''

import numpy as np

n_samples_list = [1, 10, 100, 1000]
mean_list = list(range(-21, 21))
var_list = [1E1, 1E0, 1E-1, 1E-2, 1E-3]

results = fm.compute_ndarray(
	['# Samples', 'Mean', 'Variance']
	n_samples_list,
	mean_list,
	var_list,
	normal_sampling_function,
	dtype=np.float32)

# friendly matrices can be sliced using dicts
print(results({
	'# Samples': 100,
	'Mean': 0,
	'Variance': 1,
}))

Formatting matrices

The formatted() function displays a friendly matrix as a nested list. This is useful for displaying the labels and values of smaller matrices or slice results:

mean_0_results = results({
	'# Samples': (1, 1000),
	'Mean': 0,
	'Variance': (10, 1, 0.1),
})
formatted = fm.formatted(
	mean_0_results,
	formatter=lambda n: round(n, 1))

print(formatted)

'''
Example output:

# Samples = 1:
	Variance = 10:
		2.2
	Variance = 1:
		-0.9
	Variance = 0.1:
		0.1
# Samples = 1000:
	Variance = 10:
		-0.2
	Variance = 1:
		-0.0
	Variance = 0.1:
		0.0
'''

Installation

pip install numpy-string-indexed

NumPy String-Indexed is listed in PyPI and can be installed with pip.

Prerequisites: NumPy String-Indexed 0.0.1 requires Python 3 and a compatible installation of the NumPy Python package.

Discussion and support

NumPy String-Indexed is available under the MIT License.

Owner
Aitan Grossman
Aitan Grossman
TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Yixuan Su 26 Oct 17, 2022
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation

Diego 1 Mar 20, 2022
Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Shuai Zhang 4 Dec 01, 2021
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

Galois Autocompleter 91 Sep 23, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
What are the best Systems? New Perspectives on NLP Benchmarking

What are the best Systems? New Perspectives on NLP Benchmarking In Machine Learning, a benchmark refers to an ensemble of datasets associated with one

Pierre Colombo 12 Nov 03, 2022
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023
Dope Wars game engine on StarkNet L2 roll-up

RYO Dope Wars game engine on StarkNet L2 roll-up. What TI-83 drug wars built as smart contract system. Background mechanism design notion here. Initia

104 Dec 04, 2022
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 132 Nov 25, 2022
STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

STonKGs STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs. This multimodal Transformer combin

STonKGs 27 Aug 11, 2022
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 05, 2022
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha

Greg Ver Steeg 592 Dec 18, 2022
Natural language computational chemistry command line interface.

nlcc Install pip install nlcc Must have Open-AI Codex key: export OPENAI_API_KEY=your key here then nlcc key bindings ctrl-w copy to clipboard (Note

Andrew White 37 Dec 14, 2022
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Rebiber: A tool for normalizing bibtex with official info. We often cite papers using their arXiv versions without noting that they are already PUBLIS

(Bill) Yuchen Lin 2k Jan 01, 2023
Fastseq 基于ONNXRUNTIME的文本生成加速框架

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Jun Gao 9 Nov 09, 2021
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 02, 2023
Text-Based zombie apocalyptic decision-making game in Python

Inspiration We shared university first year game coursework.[to gauge previous experience and start brainstorming] Adapted a particular nuclear fallou

Amin Sabbagh 2 Feb 17, 2022