A fast hierarchical dimensionality reduction algorithm.

Related tags

Text Data & NLPh-nne
Overview

h-NNE: Hierarchical Nearest Neighbor Embedding

A fast hierarchical dimensionality reduction algorithm.

h-NNE is a general purpose dimensionality reduction algorithm such as t-SNE and UMAP. It stands out for its speed, simplicity and the fact that it provides a hierarchy of clusterings as part of its projection process. The algorithm is inspired by the FINCH clustering algorithm. For more information on the structure of the algorithm, please look at our corresponding paper in ArXiv:

M. Saquib Sarfraz*, Marios Koulakis*, Constantin Seibold, Rainer Stiefelhagen. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction. CVPR 2022.

More details are available in the project documentation.

Installation

The project is available in PyPI. To install run:

pip install hnne

How to use h-NNE

The HNNE class implements the common methods of the sklearn interface.

Simple projection example

import numpy as np
from hnne import HNNE

data = np.random.random(size=(1000, 256))

hnne = HNNE(dim=2)
projection = hnne.fit_transform(data)

Projecting on new points

hnne = HNNE()
projection = hnne.fit_transform(data)

new_data_projection = hnne.transform(new_data)

Demos

The following demo notebooks are available:

  1. Basic Usage
  2. Multiple Projections
  3. Clustering for Free
  4. Monitor Quality of Network Embeddings

Citation

If you make use of this project in your work, it would be appreciated if you cite the hnne paper:

@article{hnne,
  title={Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction},
  author={M. Saquib Sarfraz, Marios Koulakis, Constantin Seibold, Rainer Stiefelhagen},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022}
}

If you make use of the clustering properties of the algorithm please also cite:

 @inproceedings{finch,
   author    = {M. Saquib Sarfraz and Vivek Sharma and Rainer Stiefelhagen},
   title     = {Efficient Parameter-free Clustering Using First Neighbor Relations},
   booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   pages = {8934--8943},
   year  = {2019}
}
Owner
Marios Koulakis
My latest work is in deep learning, computer vision and mathematics.
Marios Koulakis
The guide to tackle with the Text Summarization

The guide to tackle with the Text Summarization

Takahiro Kubo 1.2k Dec 30, 2022
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

Chau 65 Dec 30, 2022
Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

korean extractive summarization 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드 Leaderboard Notice Text Summarization with Pretrained Encoders에 나오는 bertsumext모델(ext

3 Aug 10, 2022
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 6k Dec 31, 2022
An open source framework for seq2seq models in PyTorch.

pytorch-seq2seq Documentation This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and

International Business Machines 1.4k Jan 02, 2023
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention April 6, 2021 We extended segment-means to compute landmarks without requiri

Zhanpeng Zeng 322 Jan 01, 2023
Levenshtein and Hamming distance computation

distance - Utilities for comparing sequences This package provides helpers for computing similarities between arbitrary sequences. Included metrics ar

112 Dec 22, 2022
Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is one of the deep-learning NLP(Natural Language Process) model struc

RUO 2 Feb 22, 2022
CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

CCF BDCI 2020 房产行业聊天问答匹配 A榜47/2985 赛题描述详见:https://www.datafountain.cn/competitions/474 文件说明 data: 存放训练数据和测试数据以及预处理代码 model_bert.py: 网络模型结构定义 adv_train

shuo 40 Sep 28, 2022
Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

Masato Hagiwara 303 Dec 17, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
Natural language computational chemistry command line interface.

nlcc Install pip install nlcc Must have Open-AI Codex key: export OPENAI_API_KEY=your key here then nlcc key bindings ctrl-w copy to clipboard (Note

Andrew White 37 Dec 14, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023