The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Overview

Joint t-sne

This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

abstract:

We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.

Environment:

How to use:

  1. Put the directory of your data sequence, e.g. "YOUR_DATA" in ./data. There are several requirements on the format and organization of your data:

    • Each data frame is named as f_i.txt, where i is the time step/index of this data frame in the sequence.
    • The j th row of the data frame contains both the feature vector and label of the j th item, which is seperated by \tab. The label is at the last position.
    • All data frames must have the same number of rows, and the the same item is at the same row in different data frames to compute the node similarities one by one.
  2. Create a configuration file, e.g. "YOUR_DATA.json" in ./config, which is organized as a json structure.

{
  "algo": {
    "k_closest_count": 3,
    "perplexity": 70,
    "bfs_level": 1,
    "gamma": 0.1
  },
  "thesne": {
    "data_name": "YOUR_DATA",
    "pts_size": 2000,
    "norm": false,
    "data_ids": [1, 3, 6, 9],
    "data_dims": [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
    "data_titles": [
      "t=0",
      "t=1",
      "t=2",
      "t=3",
      "t=4",
      "t=5",
      "t=6",
      "t=7",
      "t=8",
      "t=9"
    ]
  }
}

In this file, algo represents the hyperparamters of our algorithm except for bfs_level, which always equals to 1. thesne contains the information of the input data. Please remember that data_name must be consistent with the directory name in the previous step.

  1. Create a shell script, e.g. "YOUR_DATA.sh" in ./scripts as below:
# !/bin/bash
# 1. specify the path of the configuration file
config_path="config/YOUR_DATA.json"

workdir=$(pwd)

# 2. build knn graph for each data frame
python3 codes/graphBuild/run.py $config_path

# 3. compute edge similarities between each two adjacent data frames
buildDir="codes/graphSim/build"
if [ ! -d $buildDir ]; then
    mkdir $buildDir
    echo "create directory ${buildDir}"
else
    echo "directory ${buildDir} already exists."
fi
cd $buildDir
qmake ../
make

cd $workdir

# bin is dependent on your operating system
bin=$buildDir/graphSim.app/Contents/MacOS/graphSim
$bin $config_path


# 4. run t-sne optimization
python3 codes/thesne/run.py $config_path

There are several places you should pay attention to.

  • Again, config_path must be consitent with the name of configuration file in the previous step

  • bin is dependent on your operating system. If you use linux, you probably should change it to

      bin=$buildDir/graphSim
    
  1. In root directory, type
sh scripts/YOUR_DATA.sh

The final embeddings will be generated in ./results/YOUR_DATA.

  1. Optionally, you can use codes/draw/run.py to plot the embeddings.

Example:

You can find an example in ./scripts/10_cluster_contract.sh.

Owner
IDEAS Lab
Our mission is to enhance people's ability to understand and communicate data through the design of automated visualization and visual analytics systems.
IDEAS Lab
Synthetic Scene Text from 3D Engines

Introduction UnrealText is a project that synthesizes scene text images using 3D graphics engine. This repository accompanies our paper: UnrealText: S

Shangbang Long 215 Dec 29, 2022
The Deep Learning with Julia book, using Flux.jl.

Deep Learning with Julia DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and specifically the

Logan Kilpatrick 67 Dec 25, 2022
[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution

TTSR Official PyTorch implementation of the paper Learning Texture Transformer Network for Image Super-Resolution accepted in CVPR 2020. Contents Intr

Multimedia Research 689 Dec 28, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 08, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
Video-based open-world segmentation

UVO_Challenge Team Alpes_runner Solutions This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our

Yuming Du 84 Dec 22, 2022
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Qing Guo 146 Dec 31, 2022
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
An auto discord account and token generator. Automatically verifies the phone number. Works without proxy. Bypasses captcha.

JOIN DISCORD SERVER https://discord.gg/uAc3agBY FREE HCAPTCHA SOLVING API Discord-Token-Gen An auto discord token generator. Auto verifies phone numbe

3kp 271 Jan 01, 2023
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”.

3.7k Dec 31, 2022
PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

A Simple Baseline for Low-Budget Active Learning This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this pa

10 Nov 14, 2022
Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

DeepMind 506 Jan 08, 2023
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Code for Multimodal Neural SLAM for Interactive Instruction Following

Code for Multimodal Neural SLAM for Interactive Instruction Following Code structure The code is adapted from E.T. and most training as well as data p

7 Dec 07, 2022
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

87 Dec 09, 2022
MPI-IS Mesh Processing Library

Perceiving Systems Mesh Package This package contains core functions for manipulating meshes and visualizing them. It requires Python 3.5+ and is supp

Max Planck Institute for Intelligent Systems 494 Jan 06, 2023
Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Transformer Quantization This repository contains the implementation and experiments for the paper presented in Yelysei Bondarenko1, Markus Nagel1, Ti

83 Dec 30, 2022