Multi-objective gym environments for reinforcement learning.

Last update: Jan 03, 2023

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env	Obs/Action spaces	Objectives	Description
`deep-sea-treasure-v0`	Discrete / Discrete	`[treasure, time_penalty]`	Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
`resource-gathering-v0`	Discrete / Discrete	`[enemy, gold, gem]`	Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
`four-room-v0`	Discrete / Discrete	`[item1, item2, item3]`	Agent must collect three different types of items in the map and reach the goal.
`mo-mountaincar-v0`	Continuous / Discrete	`[time_penalty, reverse_penalty, forward_penalty]`	Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
`mo-reacher-v0`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	Reacher robot from PyBullet, but there are 4 different target positions.
`minecart-v0`	Continuous or Image / Discrete	`[ore1, ore2, fuel]`	Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
`mo-supermario-v0`	Image / Discrete	`[x_pos, time, death, coin, enemy]`	Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

The minecart-v0 env is a refactor of https://github.com/axelabels/DynMORL.
The deep-sea-treasure-v0 and mo-supermario-v0 are based on https://github.com/RunzheYang/MORL.
The four-room-v0 is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.

Comments

Adds the breakable bottles environment

Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

I wasn't really planning for creating a pull request, so the commit history is a bit messy...

opened by rk1a 4
A few bug fixes
DST:

The bounds of the rewards were hardcoded for the convex map.

The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

setup.py:

Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
opened by ffelten 2
Consider using info field for reward vector

Hello,

Thanks for this repository, it will be very useful to the MORL community :-).

I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

With kind regards,

Florian

opened by ffelten 2
Add MO reward wrappers

I added two wrappers commonly used: normalize and clip.

The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

opened by ffelten 1

Fix notebook

There are still issues with the video recorder :(

/usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
  logger.warn(

opened by ffelten 0

Add fishwood env

Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

opened by ffelten 0
Add wrapper to help logging episode returns

The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

opened by ffelten 0

Releases(0.2.1)

0.2.1(Dec 9, 2022)
5 new environments: fishwood-v0 (ESR), mo-MountainCarContinuous-v0, water-reservoir-v0, mo-highway-v0 and mo-highway-fast-v0;

Revamped README file;

Linting and automatic imports optimization;

Updated bib file and citation;

Few bugfixes.

Source code(tar.gz)
Source code(zip)
0.2.0(Sep 25, 2022)

Support for new Gym>=0.26 API
Source code(tar.gz)
Source code(zip)
0.1.2(Sep 25, 2022)

Source code(tar.gz)
Source code(zip)
0.1.1(Aug 24, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Lucas Alegre

PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.

GitHub Repository

Bayesian inference for Permuton-induced Chinese Restaurant Process (NeurIPS2021).

Permuton-induced Chinese Restaurant Process Note: Currently only the Matlab version is available, but a Python version will be available soon! This is

3 Dec 17, 2022

This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

Learning to Learn Graph Topologies This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies. Requirem

16 Dec 09, 2022

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

Diabetes-Feature-Engineering Aim Developing a machine learning model that can pr

0 Feb 23, 2022

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

53 Dec 25, 2022

La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

20 Sep 12, 2021

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

43 Dec 16, 2022

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Random Erasing Data Augmentation =============================================================== black white random This code has the source code for

654 Dec 26, 2022

AFLFast (extends AFL with Power Schedules)

AFLFast Power schedules implemented by Marcel Böhme [email protected]

380 Jan 03, 2023

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper]

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper] Downloads [Downloads] Trained ckpt files for NYU Depth V2 and

98 Jan 01, 2023

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

Reinforcement-Learning-Notebooks A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented

1k Dec 28, 2022

Multi-objective gym environments for reinforcement learning.

Related tags

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Install

Usage

Environments

Citing

Acknowledgments

Comments

Adds the breakable bottles environment

A few bug fixes

Consider using info field for reward vector

Add MO reward wrappers

Fix notebook

Add fishwood env

Add wrapper to help logging episode returns

Releases(0.2.1)

0.2.1(Dec 9, 2022)

0.2.0(Sep 25, 2022)

0.1.2(Sep 25, 2022)

0.1.1(Aug 24, 2022)

Owner

Lucas Alegre

Bayesian inference for Permuton-induced Chinese Restaurant Process (NeurIPS2021).

This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

La source de mon module 'pyfade' disponible sur Pypi.

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

AFLFast (extends AFL with Power Schedules)

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper]

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

code for Grapadora research paper experimentation

Self-supervised Deep LiDAR Odometry for Robotic Applications

Deep motion transfer

All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment

Koopman operator identification library in Python

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

TrackFormer: Multi-Object Tracking with Transformers