This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Related tags

Deep Learningslate
Overview

SLATE

This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Arxiv: https://arxiv.org/pdf/2110.11405.pdf
Project Page: https://sites.google.com/view/slate-autoencoder

Dataset

The current release provides a boilerplate code to train the model on the 3D Shapes dataset. The dataset class is provided in shapes_3d.py. You can edit or replace this class if you need to run the code on a different dataset. The 3D Shapes dataset can be downloaded from the official URL https://console.cloud.google.com/storage/browser/3d-shapes. This should produce a dataset file 3dshapes.h5. During training, the path to this dataset file needs to be provided using the argument --data_path.

Training

To train the model, simply execute:

python train.py

Check train.py to see the full list of training arguments.

Outputs

The training code produces Tensorboard logs. To see these logs, run Tensorboard on the logging directory that was provided in the training argument --log_path. These logs contain the training loss curves and visualizations of reconstructions and object attention maps.

Hyperparameters of Interest

  • Learning Rate can be tuned using the training argument --lr_main and different choices can affect the characteristics of the object attention maps.
  • Number of Slots can be tuned using the training argument --num_slots. Number of slots should be set higher than the number of objects you expect to see in the images.
  • Number of Slot Attention Iterations can be tuned using the training argument --num_iterations. In general, keep the number of iterations as small as possible because too many iterations can prevent slots from learning to diversify and attach to different objects.

Code Files

This repository provides the following files.

  • train.py contains the main code for running the training.
  • slate.py provides the model class for SLATE.
  • shapes_3d.py contains the dataset class for 3D Shapes dataset.
  • dvae.py provides the encoder and the decoder for Discrete VAE.
  • slot_attn.py provides the model class for Slot Attention encoder.
  • transformer.py provides the model classes for Transformer.
  • utils.py provides helper classes and functions for the implementation.
Owner
Gautam Singh
PhD student at Rutgers CS
Gautam Singh
Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Event Queue Dialect Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure. Motivation The m

Cornell Capra 23 Dec 08, 2022
magiCARP: Contrastive Authoring+Reviewing Pretraining

magiCARP: Contrastive Authoring+Reviewing Pretraining Welcome to the magiCARP API, the test bed used by EleutherAI for performing text/text bi-encoder

EleutherAI 43 Dec 29, 2022
Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

ielab 8 Nov 26, 2022
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023
World Models with TensorFlow 2

World Models This repo reproduces the original implementation of World Models. This implementation uses TensorFlow 2.2. Docker The easiest way to hand

Zac Wellmer 234 Nov 30, 2022
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
ZEBRA: Zero Evidence Biometric Recognition Assessment

ZEBRA: Zero Evidence Biometric Recognition Assessment license: LGPLv3 - please reference our paper version: 2020-06-11 author: Andreas Nautsch (EURECO

Voice Privacy Challenge 2 Dec 12, 2021
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergen

281 Dec 30, 2022
Turi Create simplifies the development of custom machine learning models.

Quick Links: Installation | Documentation | WWDC 2019 | WWDC 2018 Turi Create Check out our talks at WWDC 2019 and at WWDC 2018! Turi Create simplifie

Apple 10.9k Jan 01, 2023
Face recognition with trained classifiers for detecting objects using OpenCV

Face_Detector Face recognition with trained classifiers for detecting objects using OpenCV Libraries required to be installed using pip Command: cv2 n

Chumui Tripura 0 Oct 31, 2021
Reference implementation for Structured Prediction with Deep Value Networks

Deep Value Network (DVN) This code is a python reference implementation of DVNs introduced in Deep Value Networks Learn to Evaluate and Iteratively Re

Michael Gygli 55 Feb 02, 2022
Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

NLN: Nearest-Latent-Neighbours A repository containing the implementation of the paper entitled Improving Novelty Detection using the Reconstructions

Michael (Misha) Mesarcik 4 Dec 14, 2022
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning This is the official repository of "Camera Distortion-

Hanbyel Cho 12 Oct 06, 2022
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification Usage The required packages are lis

0 Feb 07, 2022
Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

fastquant 🤓 Bringing backtesting to the mainstream fastquant allows you to easily backtest investment strategies with as few as 3 lines of python cod

Lorenzo Ampil 1k Dec 29, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022