A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Overview

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

This repo contains the source code to reproduce the results in the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.

Steps to reproduce the experiments

Our experiments use docker containers to run and Weight and Biases (https://www.wandb.com/) to record the experiments, so the first step is to register a wandb account and get an API key, which we refer to as YOUR_WANDB_KEY

# build the docker container
docker build -t invalid_action_masking:latest -f sharedmemory.Dockerfile .
# build docker run commands. replace `{YOUR_WANDB_KEY}` with your own
WANDB_KEY={YOUR_WANDB_KEY} python docker.py > docker.sh
# run experiments (96 in total)
# if you have limited computational resources, consider not running all of them at a time.
# in addition, notice the commands have --cpuset-cpus="0", --cpuset-cpus="1" for different runs
# to make sure each container is only using one core. By default I assume your machine has 40 cores,
# but feel free to modify the `cores` variable in `docker.py`
bash docker.sh

Steps to reproduce the figures

Record your wandb username, which we will refer to as YOUR_WANDB_ENTITY

cd plots
WANDB_ENTITY={YOUR_WANDB_ENTITY} python episode_reward.py
WANDB_ENTITY={YOUR_WANDB_ENTITY} python approx_kl.py

These command should reproduce the PDFs in plots that are attached to the repo.

Reproduction without WANDB

Although it would be possible, it would require a significant amount of effort to properly log metrics and redo the plotting, so at this time we would not have intructions to do reproduction without WANDB. Note that it is possible to use wandb locally by following https://docs.wandb.com/self-hosted/local.

If you have an issue reproducing the results

We have tested these scripts to reproduce but it is possible that there is a bug and maybe we are assuming something specific regarding the environment. If you couldn't reproduce our results, please file an issue and we will address it as soon as the double-blind review is over.

Owner
Costa Huang
Computer Science Ph.D student at Drexel University researching Game Artificial Intelligence
Costa Huang
NeurIPS 2021 paper 'Representation Learning on Spatial Networks' code

Representation Learning on Spatial Networks This repository is the official implementation of Representation Learning on Spatial Networks. Training Ex

13 Dec 29, 2022
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Introduction This repo contains the official PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. Up

133 Dec 29, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 01, 2023
On Effective Scheduling of Model-based Reinforcement Learning

On Effective Scheduling of Model-based Reinforcement Learning Code to reproduce the experiments in On Effective Scheduling of Model-based Reinforcemen

laihang 8 Oct 07, 2022
PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images

wrist-d PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images note: Paper: Under Review at MPDI Diagnostics Submission Date: Novemb

Fatih UYSAL 5 Oct 12, 2022
El-Gamal on Elliptic Curve (Python)

El-Gamal-on-EC El-Gamal on Elliptic Curve (Python) References: https://docsdrive.com/pdfs/ansinet/itj/2005/299-306.pdf https://arxiv.org/ftp/arxiv/pap

3 May 04, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
Yolov5-opencv-cpp-python - Example of using ultralytics YOLO V5 with OpenCV 4.5.4, C++ and Python

yolov5-opencv-cpp-python Example of performing inference with ultralytics YOLO V

183 Jan 09, 2023
Cervix ROI Segmentation Using U-NET

Cervix ROI Segmentation Using U-NET Overview This code illustrate how to segment the ROI in cervical images using U-NET. The ROI here meant to include

Scotty Kwok 35 Sep 14, 2022
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
Use unsupervised and supervised learning to predict stocks

AIAlpha: Multilayer neural network architecture for stock return prediction This project is meant to be an advanced implementation of stacked neural n

Vivek Palaniappan 1.5k Dec 26, 2022
Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Automated Learning Rate Scheduler for Large-Batch Training The official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th

Kakao Brain 35 Jan 04, 2023
Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

Viewer for MuJoCo in Python Interactive renderer to use with the official Python bindings for MuJoCo. Starting with version 2.1.2, MuJoCo comes with n

Rohan P. Singh 62 Dec 30, 2022
Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

Google 621 Jan 08, 2023
A curated list of awesome Active Learning

Awesome Active Learning 🤩 A curated list of awesome Active Learning ! 🤩 Background (image source: Settles, Burr) What is Active Learning? Active lea

BAI Fan 431 Jan 03, 2023
Listing arxiv - Personalized list of today's articles from ArXiv

Personalized list of today's articles from ArXiv Print and/or send to your gmail

Lilianne Nakazono 5 Jun 17, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Weighted Projective Spaces ML Description: The database of 5-vectors describing 4d weighted projective spaces which admit Calabi-Yau hypersurfaces are

Ed Hirst 3 Sep 08, 2022
Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

75 Nov 24, 2022