Tensorflow AffordanceNet and AffContext implementations

Overview

AffordanceNet and AffContext

This is tensorflow AffordanceNet and AffContext implementations. Both are implemented and tested with tensorflow 2.3.

The main objective of both architectures is to identify action affordances, so that they can be used in real robotic applications to understand the diverse objects present in the environment.

Both models have been trained on IIT-AFF and UMD datasets.

Detections on novel image

Novel image

Example of ground truth affordances compared with the affordance detection results by AffordanceNet and AffContext on the IIT-AFF dataset.

IIT results

IIT colours

Example of ground truth affordances compared with the affordance detection results by AffordanceNet and AffContext on the UMD dataset.

UMD results

UMD colours

AffordanceNet simultaneously detects multiple objects with their corresponding classes and affordances. This network mainly consists of two branches: an object detection branch to localise and classify the objects in the image, and an affordance detection branch to predict the most probable affordance label for each pixel in the object.

AffordanceNet

AffContext correctly predicts the pixel-wise affordances independently of the class of the object, which allows to infer the affordances for unseen objects. The structure of this network is similar to AffordanceNet, but the object detection branch only performs binary classification into foreground and background areas, and it includes two new blocks: an auxiliary task to infer the affordances in the region and a self-attention mechanism to capture rich contextual dependencies through the region.

AffContext

Results

The results of the tensorflow implementation are contrasted with the values provided in the papers from AffordanceNet and AffContext. However, since the procedure of how the results are processed to obtain the final metrics in both networks may be different, the results are also compared with the values obtained by running the original trained models, but processing the outputs and calculating the measures with the code from this repository. These results are denoted with * in the comparison tables.

Affordances AffordanceNet
(Caffe)
AffordanceNet* AffordanceNet
(tf)
contain 79.61 73.68 74.17
cut 75.68 64.71 66.97
display 77.81 82.81 81.84
engine 77.50 81.09 82.63
grasp 68.48 64.13 65.49
hit 70.75 82.13 83.25
pound 69.57 65.90 65.73
support 69.57 74.43 75.26
w-grasp 70.98 77.63 78.45
Average 73.35 74.06 74.87
Affordances AffContext
(Caffe)
AffContext* AffContext
(tf)
grasp 0.60 0.51 0.55
cut 0.37 0.31 0.26
scoop 0.60 0.52 0.52
contain 0.61 0.55 0.57
pound 0.80 0.68 0.64
support 0.88 0.69 0.21
w-grasp 0.94 0.88 0.85
Average 0.69 0.59 0.51

Setup guide

Requirements

  • Python 3
  • CUDA 10.1

Installation

  1. Clone the repository into your $AffordanceNet_ROOT folder.

  2. Install the required Python3 packages with: pip3 install -r requirements.txt

Testing

  1. Download the pretrained weights:

    • AffordanceNet weights trained on IIT-AFF dataset.
    • AffContext weights trained on UMD dataset.
  2. Extract the file into $AffordanceNet_ROOT/weights folder.

  3. Visualize results for AffordanceNet trained on IIT-AFF dataset:

python3 affordancenet_predictor.py --config_file config_iit_test
  1. Visualize results for AffContext trained on UMD dataset:
python3 affcontext_predictor.py --config_file config_umd_test

Training

  1. Download the IIT-AFF or UMD datasets in Pascal-VOC format following the instructions in AffordanceNet (IIT-AFF) and AffContext(UMD).

  2. Extract them into the $AffordanceNet_ROOT/data folder and make sure to have the following folder structure for IIT-AFF dataset:

    • cache/
    • VOCdevkit2012/

The same applies for UMD dataset, but folder names should be cache_UMD and VOCdevkit2012_UMD

  1. Run the command to train AffordanceNet on IIT-AFF dataset:
python3 affordancenet_trainer.py --config_file config_iit_train
  1. Run the command to train AffContext on UMD dataset:
python3 affcontext_trainer.py --config_file config_umd_train

Acknowledgements

This repo used source code from AffordanceNet and Faster-RCNN

Owner
Beatriz Pérez
MSc student in Computer Science at Universität Bonn, Germany. Computer Engineer from Universidad de Zaragoza, Spain.
Beatriz Pérez
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 01, 2022
OpenMMLab Image Classification Toolbox and Benchmark

Introduction English | 简体中文 MMClassification is an open source image classification toolbox based on PyTorch. It is a part of the OpenMMLab project. D

OpenMMLab 1.8k Jan 03, 2023
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
202 Jan 06, 2023
A unet implementation for Image semantic segmentation

Unet-pytorch a unet implementation for Image semantic segmentation 参考网上的Unet做分割的代码,做了一个针对kaggle地盐识别的,请去以下地址获取数据集: https://www.kaggle.com/c/tgs-salt-id

Rabbit 3 Jun 29, 2022
Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

Hang_Zhou 750 Dec 23, 2022
The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

TME The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation. Our implementation is based on TG

2 Feb 10, 2022
Out of Distribution Detection on Natural Adversarial Examples

OOD-on-NAE Research project on out of distribution detection for the Computer Vision course by Prof. Rob Fergus (CSCI-GA 2271) Paper out on arXiv - ht

Anugya 1 Jun 08, 2022
Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

Brando Koch 195 Dec 30, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

deep-hist PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation PyT

Winfried Lötzsch 10 Dec 06, 2022
A minimal implementation of Gaussian process regression in PyTorch

pytorch-minimal-gaussian-process In search of truth, simplicity is needed. There exist heavy-weighted libraries, but as you know, we need to go bare b

Sangwoong Yoon 38 Nov 25, 2022
IPATool-py: download ipa easily

IPATool-py Python version of IPATool! Installation pip3 install -r requirements.txt Usage Quickstart: download app with specific bundleId into DIR: p

159 Dec 30, 2022
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

8 Jul 09, 2021
Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

PIXEL-LEVEL CRACK DETECTION FROM IMAGES OF LEVEE SYSTEMS : A COMPARATIVE STUDY G

Manisha Panta 2 Jul 23, 2022
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 181 Dec 27, 2022
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind

Omar Laham 1 Jan 14, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard J Koch 151 Dec 31, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022