This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

Overview

DendroMap

DendroMap is an interactive tool to explore large-scale image datasets used for machine learning.

A deep understanding of your data can be vital to train or debug your model effectively. However, due to the lack of structure and little-to-no metadata, it can be difficult to gain any insight into large-scale image datasets.

DendroMap adds structure to the data by hierarchically clustering together similar images. Then, the clusters are displayed in a modified treemap visualization that supports zooming.

Check out the live demo of DendroMap and explore for yourself on a few different datasets. If you're interested in

  • the DendroMap motivations
  • how we created the DendroMap visualization
  • DendroMap's effectiveness: user study on DendroMap compared to t-SNE grid for exploration

be sure to also check out our research paper:

Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps.
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng.
arXiv preprint arXiv:2205.06935, 2022.

Use Your Own Data

In the public deployment, we hosted our data in the DendroMap Data repository. You can use your own data by following the instructions and example in the DendroMap Data README.md and you can use our python functions found in the clustering folder in this repo. There, you will find specific examples and instructions for how to generate the clustering files.

After generating those files, you can add another option in the src/dataOptions.js file as an object to specify how to read your data with the correct format. This is also detailed in the DendroMap Data README.md, and is simple as adding an option like this:

{
	dataset: "YOUR DATASET NAME",
	model: "YOUR MODEL NAME",
	cluster_filepath: "CLUSTER_FILEPATH",
	class_cluster_filepath: "CLASS_CLUSTER_FILEPATH**OPTIONAL**",
	image_filepath: "IMAGE_FILEPATH",
}

in the src/dataOptions.js options array. Paths start from the public folder, so put your data in there. For more information, go to the README.md in the clustering folder. Notebooks that computed the data in DendroMap Data are located there.

DendroMap Component

The DendroMap treemap visualization itself (not the whole project) only relies on having d3.js and the accompanying Javascript files in the src/components/dendroMap directory. You can reuse that Svelte component by importing from src/components/dendroMap/DendroMap.svelte.

The Component is used in src/App.svelte for an example on what props it takes. Here is the rundown of a simple example: at the bare minimum you can create the DendroMap component with these props (propName:type).

<DendroMap
	dendrogramData:dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath:string // relative path from public dir
	imageWidth:number
	imageHeight:number
	width:number
	height:number
	numClustersShowing:number // > 1
/>

A more comprehensive list of props is below, but please look in the src/components/dendroMap/DendroMap.svelte file to see more details: there are many defaults arguments.

<DendroMap
	dendrogramData: dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath: string // relative path from public dir
	imageWidth: number
	imageHeight: number
	width: number
	height: number
	numClustersShowing: number // > 1

	// the very long list of optional props that you can use to customize the DendroMap
	// ? is not in the actual name, just indicates optional
	highlightedOpacity?: number // between [0.0, 1.0]
	hiddenOpacity?: number // between [0.0, 1.0]
	transitionSpeed?: number // milliseconds for the animation of zooming
	clusterColorInterpolateCallback?: (normalized: number) => string // by default uses d3.interpolateGreys
	labelColorCallback?: (d: d3.HierarchyNode) => string
	labelSizeCallback?: (d: d3.HierarchyNode) => string
	misclassificationColor?: string
	outlineStrokeWidth?: string
	outerPadding?: number // the outer perimeter space of a rects
	innerPadding?: number // the touching inside space between rects
	topPadding?: number // additional top padding on the top of rects
	labelYSpace?: number // shifts the image grid down to make room for label on top

	currentParentCluster?: d3.HierarchyNode // this argument is used to bind: for svelte, not really a prop
	// breadth is the default and renders nodes left to right breadth first traversal
	// min_merging_distance is the common way to get dendrogram clusters from a dendrogram
	// max_node_count traverses and splits the next largest sized node, resulting in an even rendering
	renderingMethod?: "breadth" | "min_merging_distance" | "max_node_count" | "custom_sort"
	// this is only in effect if the renderingMethod is "custom_sort". Nodes last are popped and rendered first in the sort
	customSort?: (a: dendrogramNode, b: dendrogramNode) => number // see example in code
	imagesToFocus?: number[] // instance index of the ones to highlight
	outlineMisclassified?: boolean
	focusMisclassified?: boolean
	clusterLabelCallback?: (d: d3.HierarchyNode) => string
	imageTitleCallback?: (d: d3.HierarchyNode) => string

	// will fire based on user interaction
	// detail contains <T> {data: T, element: HTMLElement, event}
	on:imageClick?: ({detail}) => void
	on:imageMouseEnter?: ({detail}) => void
	on:imageMouseLeave?: ({detail}) => void
	on:clusterClick?: ({detail}) => void
	on:clusterMouseEnter?: ({detail}) => void
	on:clusterMouseLeave?: ({detail}) => void
/>

Run Locally!

This project uses Svelte. You can run the code on your local machine by using one of the following: development or build.

Development

cd dendromap      # inside the dendromap directory
npm install       # install packages if you haven't
npm run dev       # live-reloading server on port 8080

then navigate to port 8080 for a live-reloading on file change development server.

Build

cd dendromap		# inside the dendromap directory
npm install       	# install packages if you haven't
npm run build       	# build project
npm run start		# run on port 8080

then navigate to port 8080 for the static build server.

Links

Owner
DIV Lab
Data Interaction and Visualization Lab at Oregon State University
DIV Lab
Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

41 Jan 04, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

YOLOX-Backbone yolox-backbone is a deep-learning library and is a collection of YOLOX backbone models. Install pip install yolox-backbone Load a Pret

Yonghye Kwon 21 Dec 28, 2022
Neural Scene Flow Prior (NeurIPS 2021 spotlight)

Neural Scene Flow Prior Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey Will appear on Thirty-fifth Conference on Neural Information Processing Syste

Lilac Lee 85 Jan 03, 2023
Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

TableauBits 3 May 29, 2022
EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

EfficientDet This is an implementation of EfficientDet for object detection on Keras and Tensorflow. The project is based on the official implementati

1.3k Dec 19, 2022
Social Distancing Detector

Computer vision has opened up a lot of opportunities to explore into AI domain that were earlier highly limited. Here is an application of haarcascade classifier and OpenCV to develop a social distan

Ashish Pandey 2 Jul 18, 2022
Image-to-image regression with uncertainty quantification in PyTorch

Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.

Anastasios Angelopoulos 25 Dec 26, 2022
The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Generative Deep Learning Teaching Machines to paint, write, compose and play The official code repository for examples in the O'Reilly book 'Generativ

David Foster 1.3k Dec 29, 2022
PyTorch implementation of DreamerV2 model-based RL algorithm

PyDreamer Reimplementation of DreamerV2 model-based RL algorithm in PyTorch. The official DreamerV2 implementation can be found here. Features ... Run

118 Dec 15, 2022
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

Time-Sensitive-QA The repo contains the dataset and code for NeurIPS2021 (dataset track) paper Time-Sensitive Question Answering dataset. The dataset

wenhu chen 35 Nov 14, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 09, 2023
PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recur

Jie Lei 雷杰 151 Jan 06, 2023
Advances in Neural Information Processing Systems (NeurIPS), 2020.

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

Google Research 36 Aug 26, 2022
Pytorch implementation of FlowNet by Dosovitskiy et al.

FlowNetPytorch Pytorch implementation of FlowNet by Dosovitskiy et al. This repository is a torch implementation of FlowNet, by Alexey Dosovitskiy et

Clément Pinard 762 Jan 02, 2023
ImageNet Adversarial Image Evaluation

ImageNet Adversarial Image Evaluation This repository contains the code and some materials used in the experimental work presented in the following pa

Utku Ozbulak 11 Dec 26, 2022
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Introduction Getting Started FSD50K Recipe AudioSet Recipe Label E

Yuan Gong 84 Dec 27, 2022
Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Non-attentive Tacotron - PyTorch Implementation This is Pytorch Implementation of Google's Non-attentive Tacotron, text-to-speech system. There is som

Jounghee Kim 46 Dec 19, 2022
This library is a location of the LegacyLogger for PyTorch Lightning.

neptune-contrib Documentation See neptune-contrib documentation site Installation Get prerequisites python versions 3.5.6/3.6 are supported Install li

neptune.ai 26 Oct 07, 2021