Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 framework A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse gt and its coordinates lt. Given gt and lt, the feed-forward module F extracts features ft, and the recurrent module R updates a hidden state to ht. Using an updated hidden state ht, the linear classifier C predicts the class distribution p(y|ht). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,ht) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,ht)||p(y|ht)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,ht) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|ht). The decoder D uses a sample z~q(z|ht) to synthesize a feature map f~ containing features of all glimpses. The model uses f~(l) as features of a glimpse at location l and evaluates p(y|g,l,ht)=p(y|f~(l),ht). Dashed arrows show a path to compute the lookahead class distribution p(y|f~(l),ht).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

Training a model

Use main.py to train and evaluate the model.

Arguments

  • dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
  • datapath: path to the downloaded datasets
  • lr: learning rate
  • training_phase: one of 'first', 'second', 'third'
  • ccebal: coefficient for cross entropy loss
  • batch: batch-size for training
  • batchv: batch-size for evaluation
  • T: maximum time-step
  • logfolder: path to log directory
  • epochs: number of training epochs
  • pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

example The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Deep Q-learning for playing chrome dino game

[PYTORCH] Deep Q-learning for playing Chrome Dino

Viet Nguyen 68 Dec 05, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Lossy Compression for Lossless Prediction Using: Training: This repostiory contains our implementation of the paper: Lossy Compression for Lossless Pr

Yann Dubois 84 Jan 02, 2023
Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

Code Artifacts Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driv

Andrea Stocco 2 Aug 24, 2022
Python Single Object Tracking Evaluation

pysot-toolkit The purpose of this repo is to provide evaluation API of Current Single Object Tracking Dataset, including VOT2016 VOT2018 VOT2018-LT OT

348 Dec 22, 2022
Implementation of SiameseXML (ICML 2021)

SiameseXML Code for SiameseXML: Siamese networks meet extreme classifiers with 100M labels Best Practices for features creation Adding sub-words on to

Extreme Classification 35 Nov 06, 2022
Yolov5-lite - Minimal PyTorch implementation of YOLOv5

Yolov5-Lite: Minimal YOLOv5 + Deep Sort Overview This repo is a shortened versio

Kadir Nar 57 Nov 28, 2022
StyleGAN2-ADA-training-jupyter - Training custom datasets in styleGAN2-ADA by NVIDIA using Jupyter

styleGAN2-ADA-training-jupyter Training custom datasets in styleGAN2-ADA on Jupyter Official StyleGAN2-ADA by NIVIDIA Paper Training Generative Advers

Mang Su Hyun 2 Feb 24, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

OpenMMLab 3.2k Jan 05, 2023
Baseline of DCASE 2020 task 4

Couple Learning for SED This repository provides the data and source code for sound event detection (SED) task. The improvement of the Couple Learning

21 Oct 18, 2022
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
Space Ship Simulator using python

FlyOver Basic space-ship simulator using python How to run? Just double click run.py What modules do i need? All modules that i currently using is bui

0 Oct 09, 2022
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 06, 2022
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors,

24 Oct 26, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
PromptDet: Expand Your Detector Vocabulary with Uncurated Images

PromptDet: Expand Your Detector Vocabulary with Uncurated Images Paper Website Introduction The goal of this work is to establish a scalable pipeline

103 Dec 20, 2022