Multi-objective gym environments for reinforcement learning.

Overview

tests Project Status: Active – The project has reached a stable, usable state and is being actively developed. License

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env Obs/Action spaces Objectives Description
deep-sea-treasure-v0
Discrete / Discrete [treasure, time_penalty] Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
resource-gathering-v0
Discrete / Discrete [enemy, gold, gem] Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
four-room-v0
Discrete / Discrete [item1, item2, item3] Agent must collect three different types of items in the map and reach the goal.
mo-mountaincar-v0
Continuous / Discrete [time_penalty, reverse_penalty, forward_penalty] Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
mo-reacher-v0
Continuous / Discrete [target_1, target_2, target_3, target_4] Reacher robot from PyBullet, but there are 4 different target positions.
minecart-v0
Continuous or Image / Discrete [ore1, ore2, fuel] Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
mo-supermario-v0
Image / Discrete [x_pos, time, death, coin, enemy] Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

Comments
  • Adds the breakable bottles environment

    Adds the breakable bottles environment

    Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

    I wasn't really planning for creating a pull request, so the commit history is a bit messy...

    opened by rk1a 4
  • A few bug fixes

    A few bug fixes

    DST:

    • The bounds of the rewards were hardcoded for the convex map.
    • The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

    setup.py:

    • Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
    opened by ffelten 2
  • Consider using info field for reward vector

    Consider using info field for reward vector

    Hello,

    Thanks for this repository, it will be very useful to the MORL community :-).

    I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

    For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

    With kind regards,

    Florian

    opened by ffelten 2
  • Add MO reward wrappers

    Add MO reward wrappers

    I added two wrappers commonly used: normalize and clip.

    The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

    opened by ffelten 1
  • Fix notebook

    Fix notebook

    There are still issues with the video recorder :(

    /usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
      logger.warn(
    
    opened by ffelten 0
  • Add fishwood env

    Add fishwood env

    Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

    I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

    opened by ffelten 0
  • Add wrapper to help logging episode returns

    Add wrapper to help logging episode returns

    The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

    opened by ffelten 0
Releases(0.2.1)
Owner
Lucas Alegre
PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.
Lucas Alegre
Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Delta_Conformity_Sociopatterns_Analysis ∆-Conformity is a local homophily measur

2 Jan 09, 2022
Effective Use of Transformer Networks for Entity Tracking

Effective Use of Transformer Networks for Entity Tracking (EMNLP19) This is a PyTorch implementation of our EMNLP paper on the effectiveness of pre-tr

5 Nov 06, 2021
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021
Session-based Recommendation, CoHHN, price preferences, interest preferences, Heterogeneous Hypergraph, Co-guided Learning, SIGIR2022

This is our implementation for the paper: Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation Xiaokun Zhang, Bo

Xiaokun Zhang 27 Dec 02, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
Neural-fractal - Create Fractals Using Complex-Valued Neural Networks!

Neural Fractal Create Fractals Using Complex-Valued Neural Networks! Home Page Features Define Dynamical Systems Using Complex-Valued Neural Networks

Amirabbas Asadi 10 Dec 17, 2022
PyTorch implementation of SwAV (Swapping Assignments between Views)

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments This code provides a PyTorch implementation and pretrained models for SwAV

Meta Research 1.7k Jan 04, 2023
Image classification for projects and researches

This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.

Nguyễn Trường Lâu 2 Dec 27, 2021
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Tim Esler 3.3k Jan 04, 2023
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

CoCa - Pytorch Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch. They were able to elegantly fit in contras

Phil Wang 565 Dec 30, 2022
Deep Learning Theory

Deep Learning Theory 整理了一些深度学习的理论相关内容,持续更新。 Overview Recent advances in deep learning theory 总结了目前深度学习理论研究的六个方向的一些结果,概述型,没做深入探讨(2021)。 1.1 complexity

fq 103 Jan 04, 2023
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Hyeon Jeon 7 Nov 23, 2022
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Sergey Zagoruyko 122 Dec 07, 2022
AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

News 11 Jan 2020: We clean up the code to make it more readable! The old version is here: v1. AttGAN TIP Nov. 2019, arXiv Nov. 2017 TensorFlow impleme

Zhenliang He 568 Dec 14, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
Seg-Torch for Image Segmentation with Torch

Seg-Torch for Image Segmentation with Torch This work was sparked by my personal research on simple segmentation methods based on deep learning. It is

Eren Gölge 37 Dec 12, 2022