Interactive dimensionality reduction for large datasets

Related tags

Deep Learningblossom
Overview

BlosSOM 🌼

BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimensional datasets, and produce great-looking 2-dimensional visualizations.

WARNING: BlosSOM is still under development, some stuff may not work right, but things will magically improve without notice. Feel free to open an issue if something looks wrong.

screenshot

BlosSOM was developed at the MFF UK Prague, in cooperation with IOCB Prague.

MFF logoIOCB logo

Overview

BlosSOM creates a landmark-based model of the dataset, and dynamically projects all dataset point to your screen (using EmbedSOM). Several other algorithms and tools are provided to manage the landmarks; a quick overview follows:

  • High-dimensional landmark positioning:
    • Self-organizing maps
    • k-Means
  • 2D landmark positioning
    • k-NN graph generation (only adds edges, not vertices)
    • force-based graph layouting
    • dynamic t-SNE
  • Dimensionality reduction
    • EmbedSOM
    • CUDA EmbedSOM (with roughly 500x speedup, enabling smooth display of a few millions of points)
  • Manual landmark position optimization
  • Visualization settings (colors, transparencies, cluster coloring, ...)
  • Dataset transformations and dimension scaling
  • Import from matrix-like data files
    • FCS3.0 (Flow Cytometry Standard files)
    • TSV (Tab-separated CSV)
  • Export of the data for plotting

Compiling and running BlosSOM

You will need cmake build system and SDL2.

For CUDA EmbedSOM to work, you need the NVIDIA CUDA toolkit. Append -DBUILD_CUDA=1 to cmake options to enable the CUDA version.

Windows (Visual Studio 2019)

Dependencies

The project requires SDL2 as an external dependency:

  1. install vcpkg tool and remember your vcpkg directory
  2. install SDL: vcpkg install SDL2:x64-windows

Compilation

git submodule init
git submodule update

mkdir build
cd build

# You need to fix the path to vcpkg in the following command:
cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX=./inst -DCMAKE_TOOLCHAIN_FILE=your-vcpkg-clone-directory/scripts/buildsystems/vcpkg.cmake

cmake --build . --config Release
cmake --install . --config Release

Running

Open Visual Studio solution BlosSOM.sln, set blossom as startup project, set configuration to Release and run the project.

Linux (and possibly other unix-like systems)

Dependencies

The project requires SDL2 as an external dependency. Install libsdl2-dev (on Debian-based systems) or SDL2-devel (on Red Hat-based systems), or similar (depending on the Linux distribution). You should be able to install cmake package the same way.

Compilation

git submodule init
git submodule update

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=./inst    # or any other directory
make install                              # use -j option to speed up the build

Running

./inst/bin/blossom

Documentation

Quickstart

  1. Click on the "plus" button on the bottom right side of the window
  2. Choose Open file (the first button from the top) and open a file from the demo_data/ directory
  3. You can now add and delete landmarks using ctrl+mouse click, and drag them around.
  4. Use the tools and settings available under the "plus" button to optimize the landmark positions and get a better visualization.

See the HOWTO for more details and hints.

Performance and CUDA

If you pass -DBUILD_CUDA=1 to the cmake commands, you will get extra executable called blossom_cuda (or blossom_cuda.exe, on Windows).

The 2 versions of BlosSOM executable differ mainly in the performance of EmbedSOM projection, which is more than 100× faster on GPUs than on CPUs. If the dataset gets large, only a fixed-size slice of the dataset gets processed each frame (e.g., at most 1000 points in case of CPU) to keep the framerate in a usable range. The defaults in BlosSOM should work smoothly for many use-cases (defaulting at 1k points per frame on CPU and 50k points per frame on GPU).

If required (e.g., if you have a really fast GPU), you may modify the constants in the corresponding source files, around the call sites of clean_range(), which is the function that manages the round-robin refreshing of the data. Functionality that dynamically chooses the best data-crunching rate is being implemented and should be available soon.

License

BlosSOM is licensed under GPLv3 or later. Several small libraries bundled in the repository are licensed with MIT-style licenses.

Multi-View Radar Semantic Segmentation

Multi-View Radar Semantic Segmentation Paper Multi-View Radar Semantic Segmentation, ICCV 2021. Arthur Ouaknine, Alasdair Newson, Patrick Pérez, Flore

valeo.ai 37 Oct 25, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

MI-AOD Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection (The PDF is not available tem

Tianning Yuan 269 Dec 21, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning This repository is the official PyTorch implementation of CORE-Text, a

Jingyang Lin 18 Aug 11, 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

ENet in Caffe Execution times and hardware requirements Network 1024x512 1280x720 Parameters Model size (fp32) ENet 20.4 ms 32.9 ms 0.36 M 1.5 MB SegN

Timo Sämann 561 Jan 04, 2023
Instance Semantic Segmentation List

Instance Semantic Segmentation List This repository contains lists of state-or-art instance semantic segmentation works. Papers and resources are list

bighead 87 Mar 06, 2022
A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

SPAAR Description A toolset of Python programs for signal modeling via sparse semilinear autoregressors. References Vides, F. (2021). Computing Semili

Fredy Vides 0 Oct 30, 2021
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

166 Jan 01, 2023
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA: Distant Supervision for Low-Resource Named Entity Recognition ANEA is a tool to automatically annotate named entities in unlabeled text based on

Saarland University Spoken Language Systems Group 15 Mar 30, 2022
Source code for The Power of Many: A Physarum Swarm Steiner Tree Algorithm

Physarum-Swarm-Steiner-Algo Source code for The Power of Many: A Physarum Steiner Tree Algorithm Code implements ideas from the following papers: Sher

Sheryl Hsu 2 Mar 28, 2022
Exploring Image Deblurring via Blur Kernel Space (CVPR'21)

Exploring Image Deblurring via Encoded Blur Kernel Space About the project We introduce a method to encode the blur operators of an arbitrary dataset

VinAI Research 118 Dec 19, 2022
The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

Picovoice 318 Jan 07, 2023
Use tensorflow to implement a Deep Neural Network for real time lane detection

LaneNet-Lane-Detection Use tensorflow to implement a Deep Neural Network for real time lane detection mainly based on the IEEE IV conference paper "To

MaybeShewill-CV 1.9k Jan 08, 2023
Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Amirsina Torfi 114 Dec 18, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4. YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitraril

Adam Van Etten 161 Jan 06, 2023