A parallel framework for population-based multi-agent reinforcement learning.

Last update: Jan 08, 2023

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}

A parallel framework for population-based multi-agent reinforcement learning.

Related tags

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

Installation

Quick Start

Citing MALib

Owner

MARL @ SJTU

Algorithm to texture 3D reconstructions from multi-view stereo images

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Code for our ALiBi method for transformer language models.

abess: Fast Best-Subset Selection in Python and R

Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Simulation-based inference for the Galactic Center Excess

Repository for Multimodal AutoML Benchmark

Tool for installing and updating MiSTer cores and other files

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Tutorial on scikit-learn and IPython for parallel machine learning

MXNet implementation for: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

DeepSpamReview: Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures. Summer Internship project at CoreView Systems.

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Datasets for new state-of-the-art challenge in disentanglement learning

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations