Framework for training options with different attention mechanism and using them to solve downstream tasks.

Overview

Using Attention in HRL

Framework for training options with different attention mechanism and using them to solve downstream tasks.

Requirements

GPU required

conda env create -f conda_env.yml

After the instalation ends you can activate your environment and install remaining dependencies. (e.g. sub-module gym_minigrid which is a modified version of MiniGrid )

conda activate affenv
cd gym-minigrid
pip install -e .
cd ../
pip install -e .

Instructions

In order to train options and IC_net follow these steps:

1. Configure desired environment - number of task and objects per task in file config/op_ic_net.yaml. E.g:
  env_args:
    task_size: 3
    num_tasks: 4

2. Configure desired type of attention (between "affordance", "interest", "nan") - in file config/op_ic_net.yaml. E.g. 
main:
  attention: "affordance" 

3. Train by running command
liftoff train_main.py configs/op_ic_net.yaml

Once a pre-trained option checkpoint exists a HRL agent can be trained to solve the downstream task (for the same environment the options were trained on). Follow these steps in order to train an HRL-Agent with different types of attentions:

1. Configure checkpoint (experiment config file and options_model_id) for pre-trained Options and IC_net - in file configs/hrl-agent.yaml. E.g: 

main:
  options_model_cfg: "results/op_aff_4x3/0000_multiobj/0/cfg.yaml"
  options_model_id: -1  # Last checkpoint will be used

2. Configure type of attention for training the HRL-agent (between "affordance", "interest", "nan") - in file configs/hrl-agent.yaml. E.g:
main:
  modulate_policy: affordance

3. Train HRL-agent by running command
liftoff train_mtop_ppo.py configs/hrl-agent.yaml

Both training scrips produce results in the results folder, where all the outputs are going to be stored including train/eval logs, checkpoints. Live plotting is integrated using services from Wandb (plotting has to be enabled in the config file main:plot and user logged in Wandb or user login api key in the file .wandb_key).

The console output is also available in a form:

  • Option Pre-training e.g.:
U 11 | F 022528 | FPS 0024 | D 402 | rR:u, 0.03 | F:u, 41.77 | tL:u 0.00 | tPL:u 6.47 | tNL:u 0.00 | t 52 | aff_loss 0.0570 | aff 2.8628 | NOaff 0.0159 | ic 0.0312 | cnt_ic 1.0000 | oe 2.4464 | oic0 0.0000 | oic1 0.0000 | oic2 0.0000 | oic3 0.0000 | oPic0 0.0000 | oPic1 0.0000 | oPic2 0.0000 | oPic3 0.0000 | icB 0.0208 | PicB 0.1429 | icND 0.0192

Some of the training entries decodes as

F - number of frames (steps in the env)
tL - termination loss
aff_loss - IC_net loss
cnt_ic - Intent completion per training batch 
oicN - Intent completion fraction for each option N out of Total option N sampled
oPicN - Intent completion fraction for each option N out of affordable ones
PicB - Intent completion average over all options out of affordable ones
  • HRL-agent training
U 1 | F 4555192.0 | FPS 21767 | D 209 | rR:u, 0.00 | F:u, 8.11 | e:u, 2.48 | v:u 0.00 | pL:u 0.01 | vL:u 0.00 | g:u 0.01 | TrR:u, 0.00

Some of the training entries decodes as

F - number of frames (steps in the env offseted by the number of pre-training steps)
rR - Accumulated episode reward average
TrR - Average episode success rate

Framework structure

The code is organised as follows:

  • agents/ - implementation of agents (e.g. training options and IC_net multistep_affordance.py; hrl-agent PPO ppo_smdp.py )
  • configs/ - config files for training agents
  • gym-minigrid/ - sub-module - Minigrid envs
  • models/ - Neural network modules (e.g options with IC_net aff_multistep.py and CNN backbone extractor_cnn_v2.py)
  • utils/ - Scripts for e.g.: running envs in parallel, preprocessing observations, gym wrappers, data structures, logging modules
  • train_main.py - Train Options with IC_net
  • train_mtop_ppo.py - Train HRL-agent

Acknowledgements

We used PyTorch as a machine learning framework.

We used liftoff for experiment management.

We used wandb for plotting.

We used PPO adapted for training our agents.

We used MiniGrid to create our environment.

A symbolic-model-guided fuzzer for TLS

tlspuffin TLS Protocol Under FuzzINg A symbolic-model-guided fuzzer for TLS Master Thesis | Thesis Presentation | Documentation Disclaimer: The term "

69 Dec 20, 2022
[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t

Jaedong Hwang 49 Dec 30, 2022
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urb

Yu Tian 117 Jan 03, 2023
BRepNet: A topological message passing system for solid models

BRepNet: A topological message passing system for solid models This repository contains the an implementation of BRepNet: A topological message passin

Autodesk AI Lab 42 Dec 30, 2022
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
Self-Adaptable Point Processes with Nonparametric Time Decays

NPPDecay This is our implementation for the paper Self-Adaptable Point Processes with Nonparametric Time Decays, by Zhimeng Pan, Zheng Wang, Jeff M. P

zpan 2 Sep 24, 2022
GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

Chi Bui 113 Dec 29, 2022
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

Michihiro Yasunaga 155 Jan 08, 2023
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
Papers about explainability of GNNs

Papers about explainability of GNNs

Dongsheng Luo 236 Jan 04, 2023
A library for uncertainty representation and training in neural networks.

Epistemic Neural Networks A library for uncertainty representation and training in neural networks. Introduction Many applications in deep learning re

DeepMind 211 Dec 12, 2022
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

Self-Attention Context Network for Hyperspectral Image Classification PyTorch implementation of our method for adversarial attacks and defenses in hyp

22 Dec 02, 2022
A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

Vikash Sehwag 65 Dec 19, 2022
An LSTM based GAN for Human motion synthesis

GAN-motion-Prediction An LSTM based GAN for motion synthesis has a few issues reading H3.6M data from A.Jain et al , will fix soon. Prediction of the

Amogh Adishesha 9 Jun 17, 2022