Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 framework A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse gt and its coordinates lt. Given gt and lt, the feed-forward module F extracts features ft, and the recurrent module R updates a hidden state to ht. Using an updated hidden state ht, the linear classifier C predicts the class distribution p(y|ht). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,ht) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,ht)||p(y|ht)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,ht) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|ht). The decoder D uses a sample z~q(z|ht) to synthesize a feature map f~ containing features of all glimpses. The model uses f~(l) as features of a glimpse at location l and evaluates p(y|g,l,ht)=p(y|f~(l),ht). Dashed arrows show a path to compute the lookahead class distribution p(y|f~(l),ht).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

Training a model

Use main.py to train and evaluate the model.

Arguments

  • dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
  • datapath: path to the downloaded datasets
  • lr: learning rate
  • training_phase: one of 'first', 'second', 'third'
  • ccebal: coefficient for cross entropy loss
  • batch: batch-size for training
  • batchv: batch-size for evaluation
  • T: maximum time-step
  • logfolder: path to log directory
  • epochs: number of training epochs
  • pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

example The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Official Python implementation of the FuzionCoin protocol

PyFuzc Official Python implementation of the FuzionCoin protocol WARNING: Under construction. Use at your own risk. Some functions may not work. Setup

FuzionCoin 3 Jul 07, 2022
[CVPR'21] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

IVOS-W Paper Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild Zhaoyun Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanli

SVIP Lab 38 Dec 12, 2022
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

Henghui Ding 143 Dec 23, 2022
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

48 Dec 20, 2022
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
A Flow-based Generative Network for Speech Synthesis

WaveGlow: a Flow-based Generative Network for Speech Synthesis Ryan Prenger, Rafael Valle, and Bryan Catanzaro In our recent paper, we propose WaveGlo

NVIDIA Corporation 2k Dec 26, 2022
A proof of concept ai-powered Recaptcha v2 solver

Recaptcha Fullauto I've decided to open source my old Recaptcha v2 solver. My latest version will be opened sourced this summer. I am hoping this proj

Nate 60 Dec 20, 2022
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

mlpc-ucsd 133 Dec 30, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
Implementation of CVPR'2022:Surface Reconstruction from Point Clouds by Learning Predictive Context Priors

Surface Reconstruction from Point Clouds by Learning Predictive Context Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository c

136 Dec 12, 2022
Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

NVIDIA Research Projects 10.6k Jan 01, 2023
Gym for multi-agent reinforcement learning

PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym. Our website, with

Farama Foundation 1.6k Jan 09, 2023
Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

Seattle university Renewable energy research 7 Sep 26, 2022
Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

Deformable Butterfly: A Highly Structured and Sparse Linear Transform DeBut Advantages DeBut generalizes the square power of two butterfly factor matr

Rui LIN 8 Jun 10, 2022
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

Jianquan Ye 298 Dec 21, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023
Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

Avnika Mehta 1 Jan 27, 2022
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021