SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Overview

SalGAN: Visual Saliency Prediction with Adversarial Networks

Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayrol Xavier Giro-i-Nieto
Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayrol Xavier Giro-i-Nieto

A joint collaboration between:

logo-insight logo-dcu logo-microsoft logo-facebook logo-bsc logo-upc
Insight Centre for Data Analytics Dublin City University (DCU) Microsoft Facebook Barcelona Supercomputing Center Universitat Politecnica de Catalunya (UPC)

Abstract

We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE.

Slides

<iframe src="//www.slideshare.net/slideshow/embed_code/key/5cXl80Fm2c3ksg" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>

Publication

Find the extended pre-print version of our work on arXiv. The shorter extended abstract presented as spotlight in the CVPR 2017 Scene Understanding Workshop (SUNw) is available here.

Image of the paper

Please cite with the following Bibtex code:

@InProceedings{Pan_2017_SalGAN,
author = {Pan, Junting and Canton, Cristian and McGuinness, Kevin and O'Connor, Noel E. and Torres, Jordi and Sayrol, Elisa and Giro-i-Nieto, Xavier and},
title = {SalGAN: Visual Saliency Prediction with Generative Adversarial Networks},
booktitle = {arXiv},
month = {January},
year = {2017}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. "SalGAN: Visual Saliency Prediction with Generative Adversarial Networks." arXiv. 2017.

Architecture

architecture-fig

Model parameters

The parameters to run SalGAN can be downloaded here:

If you wanted to train the model, you will also need this additional file

Visual Results

Qualitative saliency predictions

Datasets

Training

As explained in our paper, our networks were trained on the training and validation data provided by SALICON.

Test

Two different dataset were used for test:

Software frameworks

Our paper presents two convolutional neural networks, one correspends to the Generator (Saliency Prediction Network) and the another is the Discriminator for the adversarial training. To compute saliency maps only the Generator is needed.

SalGAN on Lasagne

SalGAN is implemented in Lasagne, which at its time is developed over Theano.

pip install -r https://raw.githubusercontent.com/imatge-upc/saliency-salgan-2017/master/requirements.txt

SalGAN on a docker

We have prepared this Docker container with all necessary dependencies for computing saliency maps with SalGAN. You will need to use nvidia-docker.

Using the container is like connecting via ssh to a machine. To start an interactive session run:

    >> sudo nvidia-docker run -it --entrypoint='bash' -w /home/ evamohe/salgan

This will open a terminal within the container located in the '/home' folder.

Yo will find Salgan code in "/home/salgan". So if you want to test the installation, within the container, run:

   >> cd /home/salgan/scripts
   >> THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.5,optimizer_including=cudnn python 03-predict.py

That will process the sample images located in "/home/salgan/images" and store them in "/home/salgan/saliency". To exit the container, run:

   >> exit

You migh want to process your own data with your own custom scripts. For that, you can mount different local folders in the container. For example:

>> sudo nvidia-docker run -v $PATH_TO_MY_CODE:/home/code -v $PATH_TO_MY_DATA:/home/data -it --entrypoint='bash' -w /home/

will open a new session in the container, with '/home/code' and '/home/data' folders that will be share with your computer. If you edit your code locally, the changes will be updated automatically in the container. Similarly, all the files generated in '/home/data' will be available in your original data folder.

Usage

To train our model from scrath you need to run the following command:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=1,optimizer_including=cudnn python 02-train.py

In order to run the test script to predict saliency maps, you can run the following command after specifying the path to you images and the path to the output saliency maps:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=1,optimizer_including=cudnn python 03-predict.py

With the provided model weights you should obtain the follwing result:

Image Stimuli Saliency Map

Download the pretrained VGG-16 weights from: vgg16.pkl

External implementation in PyTorch

Bat-Orgil Batsaikhan and Catherine Qi Zhao from the University of Minnesota released a PyTorch implementation in 2018 as part of their poster "Generative Adversarial Network for Videos and Saliency Map".

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at the UPC.

AlbertGil-photo JosepPujal-photo
Albert Gil Josep Pujal
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan Z and Titan X used in this work. logo-nvidia
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
This work has been developed in the framework of the projects BigGraph TEC2013-43935-R and Malegra TEC2016-75976-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number SFI/12/RC/2289. logo-ireland

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:[email protected].

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-7678045-13', 'auto'); ga('send', 'pageview'); </script>
Owner
Image Processing Group - BarcelonaTECH - UPC
Image Processing Group - BarcelonaTECH - UPC
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy Paper. Pr

Yu Rong 41 Dec 13, 2022
Whisper is a file-based time-series database format for Graphite.

Whisper Overview Whisper is one of three components within the Graphite project: Graphite-Web, a Django-based web application that renders graphs and

Graphite Project 1.2k Dec 25, 2022
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras. Implementation of various Deep Image Segmentation mo

Divam Gupta 2.6k Jan 05, 2023
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
JAX + dataclasses

jax_dataclasses jax_dataclasses provides a wrapper around dataclasses.dataclass for use in JAX, which enables automatic support for: Pytree registrati

Brent Yi 35 Dec 21, 2022
Structural Constraints on Information Content in Human Brain States

Structural Constraints on Information Content in Human Brain States Code accompanying the paper "The information content of brain states is explained

Leon Weninger 3 Sep 07, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
验证码识别 深度学习 tensorflow 神经网络

captcha_tf2 验证码识别 深度学习 tensorflow 神经网络 使用卷积神经网络,对字符,数字类型验证码进行识别,tensorflow使用2.0以上 目前项目还在更新中,诸多bug,欢迎提出issue和PR, 希望和你一起共同完善项目。 实例demo 训练过程 优化器选择: Adam

5 Apr 28, 2022
TensorFlow implementation of ENet, trained on the Cityscapes dataset.

segmentation TensorFlow implementation of ENet (https://arxiv.org/pdf/1606.02147.pdf) based on the official Torch implementation (https://github.com/e

Fredrik Gustafsson 248 Dec 16, 2022
Fast and accurate optimisation for registration with little learningconvexadam

convexAdam Learn2Reg 2021 Submission Fast and accurate optimisation for registration with little learning Excellent results on Learn2Reg 2021 challeng

17 Dec 06, 2022
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Contact and Human Dynamics from Monocular Video This is the official implementation for the ECCV 2020 spotlight paper by Davis Rempe, Leonidas J. Guib

Davis Rempe 207 Jan 05, 2023
Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

Shufflenet-v2-Pytorch Introduction This is a Pytorch implementation of faceplusplus's ShuffleNet-v2. For details, please read the following papers:

423 Dec 07, 2022
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Decoupled Spatial-Temporal Graph Neural Networks Code for our paper: Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting.

S22 43 Jan 04, 2023
Scalable Multi-Agent Reinforcement Learning

Scalable Multi-Agent Reinforcement Learning 1. Featured algorithms: Value Function Factorization with Variable Agent Sub-Teams (VAST) [1] 2. Implement

3 Aug 02, 2022
Directed Greybox Fuzzing with AFL

AFLGo: Directed Greybox Fuzzing AFLGo is an extension of American Fuzzy Lop (AFL). Given a set of target locations (e.g., folder/file.c:582), AFLGo ge

380 Nov 24, 2022
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022