Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Overview

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

This is a PyTorch implementation of the original Lua code release.

Overview

This codebase implements two approaches to learning discrete communication protocols for playing collaborative games: Reinforced Inter-Agent Learning (RIAL), in which agents learn a factorized deep Q-learning policy across game actions and messages, and Differentiable Inter-Agent Learning (DIAL), in which the message vectors are directly learned by backpropagating errors through a noisy communication channel during training, and discretized to binary vectors during test time. While RIAL and DIAL share the same individual network architecture, one would expect learning to be more efficient under DIAL, which directly backpropagates downstream errors during training, a fact that is verified in comparing the performance of the two approaches.

Execution

$ virtualenv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ python main.py -c config/switch_3_dial.json

Results for switch game

DIAL vs. RIAL reward curves

This chart was generated by plotting an exponentially-weighted average across 20 trials for each curve.

More info

More generally, main.py takes multiple arguments:

Arg Short Description Required?
--config_path -c path to JSON configuration file
--results_path -r path to directory in which to save results per trial (as csv) -
--ntrials -n number of trials to run -
--start_index -s start-index used as suffix in result filenames -
--verbose -v prints results per training epoch to stdout if set -
Configuration

JSON configuration files passed to main.py should consist of the following key-value pairs:

Key Description Type
game name of the game, e.g. "switch" string
game_nagents number of agents int
game_action_space number of valid game actions int
game_comm_limited true if only some agents can communicate at each step bool
game_comm_bits number of bits per message int
game_comm_sigma standard deviation of Gaussian noise applied by DRU float
game_comm_hard true if use hard discretization, soft approximation otherwise bool
nsteps maximum number of game steps int
gamma reward discount factor for Q-learning float
model_dial true if agents should use DIAL bool
model_comm_narrow true if DRU should use sigmoid for regularization, softmax otherwise bool
model_target true if learning should use a target Q-network bool
model_bn true if learning should use batch normalization bool
model_know_share true if agents should share parameters bool
model_action_aware true if each agent should know their last action bool
model_rnn_size dimension of rnn hidden state int
bs batch size of episodes, run in parallel per epoch int
learningrate learning rate for optimizer (RMSProp) float
momentum momentum for optimizer (RMSProp) float
eps exploration rate for epsilon-greedy exploration float
nepisodes number of epochs, each consisting of parallel episodes int
step_test perform a test episode every this many steps int
step_target update target network every this many steps int
Visualizing results

You can use analyze_results.py to graph results output by main.py. This script will plot the average results across all csv files per path specified after -r. Further, -a can take an alpha value to plot results as exponentially-weighted moving averages, and -l takes an optional list of labels corresponding to the paths.

$ python util/analyze_results -r <paths to results> -a <weight for EWMA>

Bibtex

@inproceedings{foerster2016learning,
    title={Learning to communicate with deep multi-agent reinforcement learning},
    author={Foerster, Jakob and Assael, Yannis M and de Freitas, Nando and Whiteson, Shimon},
    booktitle={Advances in Neural Information Processing Systems},
    pages={2137--2145},
    year={2016} 
}

License

Code licensed under the Apache License v2.0

Owner
Minqi
Minqi
PyTorch implementation of EigenGAN

PyTorch Implementation of EigenGAN Train python train.py [image_folder_path] --name [experiment name] Test python test.py [ckpt path] --traverse FFH

62 Nov 12, 2022
Multimodal Temporal Context Network (MTCN)

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Bottom-Up and Top-Down Attention for Visual Question Answering An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge. The

Hengyuan Hu 731 Jan 03, 2023
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
Python package for downloading ECMWF reanalysis data and converting it into a time series format.

ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If

TU Wien - Department of Geodesy and Geoinformation 31 Dec 26, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Perturb-and-max-product: Sampling and learning in discrete energy-based models

Perturb-and-max-product: Sampling and learning in discrete energy-based models This repo contains code for reproducing the results in the paper Pertur

Vicarious 2 Mar 14, 2022
Fairness Metrics: All you need to know

Fairness Metrics: All you need to know Testing machine learning software for ethical bias has become a pressing current concern. Recent research has p

Anonymous2020 1 Jan 17, 2022
Bio-OFC gym implementation and Gym-Fly environment

Bio-OFC gym implementation and Gym-Fly environment This repository includes the gym compatible implementation of the Bio-OFC algorithm from the paper

Siavash Golkar 1 Nov 16, 2021
Deep Learning pipeline for motor-imagery classification.

BCI-ToolBox 1. Introduction BCI-ToolBox is deep learning pipeline for motor-imagery classification. This repo contains five models: ShallowConvNet, De

DongHee 18 Oct 31, 2022
use machine learning to recognize gesture on raspberrypi

Raspberrypi_Gesture-Recognition use machine learning to recognize gesture on raspberrypi 說明 利用 tensorflow lite 訓練手部辨識模型 分辨 "剪刀"、"石頭"、"布" 之手勢 再將訓練模型匯入

1 Dec 10, 2021
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages This repository contains the code for the pa

Kelechi 40 Nov 24, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
DTCN SMP Challenge - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Streamlit Demo: The Udacity Self-driving Car Image Browser This project demonstrates the Udacity self-driving-car dataset and YOLO object detection in

Streamlit 992 Jan 04, 2023
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Dataset Condensation with Contrastive Signals

Dataset Condensation with Contrastive Signals This repository is the official implementation of Dataset Condensation with Contrastive Signals (DCC). T

3 May 19, 2022
CSD: Consistency-based Semi-supervised learning for object Detection

CSD: Consistency-based Semi-supervised learning for object Detection (NeurIPS 2019) By Jisoo Jeong, Seungeui Lee, Jee-soo Kim, Nojun Kwak Installation

80 Dec 15, 2022