Deep reinforcement learning library built on top of Neural Network Libraries

Last update: Dec 14, 2022

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]

# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title	Notebook	Target RL task
Simple reinforcement learning training to get started		Pendulum
Learn how to use training algorithms		Pendulum
Learn how to use customized network model for training		Mountain car
Learn how to use different network solver for training		Pendulum
Learn how to use different replay buffer for training		Pendulum
Learn how to use your own environment for training		Customized environment
Atari game training example		Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments

Update cem function interface

Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

opened by sbsekiguchi 1
Add implementation for RNN support and DRQN algorithm
Add RNN model support and DRQN algorithm.

Following trainers will support RNN-model.

Q value-based trainers

Deterministic gradient and Soft policy trainers

Other trainers can support RNN models in future but is not implemented in the initial release.

See this paper for the details of the DRQN algorithm.
opened by ishihara-y 1

Implement SACD

This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

These changes have been made:

New environments with factored reward functions have been added
- FactoredLunarLanderContinuousV2NNablaRL-v1
- FactoredAntV4NNablaRL-v1
- FactoredHopperV4NNablaRL-v1
- FactoredHalfCheetahV4NNablaRL-v1
- FactoredWalker2dV4NNablaRL-v1
- FactoredHumanoidV4NNablaRL-v1
SACD algorithms has been added
SoftQDTrainer has been added
_InfluenceMetricsEvaluator has been added
reproduction script has been added (not benchmarked yet)

visualizing influence metrics

import gym

import numpy as np
import matplotlib.pyplot as plt

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")

evaluation_hook = H.EvaluationHook(
    eval_env,
    EpisodicEvaluator(run_per_evaluation=10),
    timing=5000,
    writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
)
iteration_num_hook = H.IterationNumHook(timing=100)

config = A.SACDConfig(gpu_id=0, reward_dimension=9)
sacd = A.SACD(env, config=config)
sacd.set_hooks([iteration_num_hook, evaluation_hook])
sacd.train_online(env, total_iterations=100000)

influence_history = []

state = env.reset()
while True:
    action = sacd.compute_eval_action(state)
    influence = sacd.compute_influence_metrics(state, action)
    influence_history.append(influence)
    state, _, done, _ = env.step(action)
    if done:
        break

influence_history = np.array(influence_history)
for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
    plt.plot(influence_history[:, i], label=label)
plt.xlabel("step")
plt.ylabel("influence metrics")
plt.legend()
plt.show()

sample animation

sample

opened by ishihara-y 0

Add gmm and Update gaussian

Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

The API change is like following:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
distribution = D.Gaussian(mean, ln_var)
# return nn.Variable
assert isinstance(distribution.sample(), nn.Variable)

Updated:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
# You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
assert isinstance(distribution.sample(), nn.Variable)

# If you pass np.ndarray, then all class methods return np.ndarray
# Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
assert isinstance(distribution.sample(), np.ndarray)

opened by sbsekiguchi 0

Support nnabla-browser

[x] add MonitorWriter
[x] save computational graph as nntxt

example

import gym

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

# save training computational graph
training_graph_hook = H.TrainingGraphHook(outdir="test")

# evaluation hook with nnabla's Monitor
eval_env = gym.make("Pendulum-v0")
evaluator = EpisodicEvaluator(run_per_evaluation=10)
evaluation_hook = H.EvaluationHook(
    eval_env,
    evaluator,
    timing=10,
    writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
)

env = gym.make("Pendulum-v0")
sac = A.SAC(env)
sac.set_hooks([training_graph_hook, evaluation_hook])

sac.train_online(env, total_iterations=100)

opened by ishihara-y 0

Add iLQR and LQR

Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

opened by ishihara-y 0
Check np_random instance and use correct randint alternative
I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

# in case of RandomState # this line works gym.unwrapped.np_random.rand_int(...) # in case of Generator # rand_int does not exist and we must use integers as an alternative gym.unwrapped.np_random.integers(...)

This PR will fix this issue and chooses correct function for sampling integers.
opened by ishihara-y 0
Add icra2018 qtopt

Add QtOpt algorithm proposed by Deirdre Quillen et al. in the paper Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods.

opened by sbsekiguchi 0

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)
special notes

This version does NOT support the version v0.26.0 and greater of openai gym.

We're going to support openai gym version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of openai gym from the next release.

Only support python 3.7 or greater

Python 3.6 is not supported from this new release

release-note-bugfix

Fix algos. Properly apply grad clip and weight decay

Correct variable to use during rnn training

Check np_random instance and use correct randint alternative

Fix pendulum-env render

Fix ScreenRenderEnv to support gym 0.25.0

release-note-algorithm

Run PPO on single process when actor num is 1

Add qrsac algorithm

Add REDQ algorithm

Update to support discrete tuple

Add icra2018 qtopt

Add goal_env module

Add PPO tuple state support

Add iLQR and LQR

Add mppi

Add ddp

release-note-distributions

Add gmm and Update gaussian

release-note-utility

Support nnabla-browser

release-note-docs

Fix module path of sac

Improve README with graph visulization feature with nnabla-browser

release-note-build

Extend github build timelimit to 5 minutes

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 17, 2022)
release-note-bugfix

Fix readme of reproduction

Fix cem test

Fix README samples and add prerequisites for Atari reproduction codes

Fix tutorial-model

Fix add workaround to avoid gym error

release-note-algorithm

Add ATRPO

Add implementation for RNN support and DRQN algorithm, Support RNN models on DQN and DQN inherited algorithms, Follow DRQN author's implementation and update results

Expand RNN support to dist rl algorithms

Add rnn support to actor critic algorithms

Support n-step q learning in ddpg, td3, her, sac and ICML2018SAC

Stop back propagating to target v function

Add MME-SAC algorithm and Sparse/Delayed mujoco environment and Add Disentangled version of MME-SAC

release-note-functions

Add stop gradient function

Add random shooting

Update cem function interface

release-note-distributions

Add Bernoulli distribution

Enable sampling from multidimensional logits

Add one hot softmax

release-note-utility

Support batched states for evaluation

Add convenient episode result env

Add profile function

release-note-docs

Update version in algorithm catalog

Add readthedocs yaml and Fixed yaml file

Add HER and IQN to algorithm catalog

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 20, 2021)
release-note-bugfix

Fix interactive-demos used in colab and Fix interactive-demos used in colab about gpu id

release-note-algorithm

Add HER

Add Rainbow

Fix algorithm reproduction directory path

Add rank-based prioritized replay

Add Double Dqn

Move algorithms reproduction dir to reproductions/algorithms

Enable injecting explorer to algorithm

Support multi-step Q learning

Add Categorical Double Dqn

Add c51 all atari game results

Support Tuple State and Update compute_v_target_and_advantage to support tuple state

release-note-parametric_functions

Add spatial_softmax function and Add spatial softmax docs

Add noisy net

release-note-functions

Add batch_flatten function

Add triangular_matrix function

release-note-utility

Fix load_snapshot

release-note-docs

Fix docs typo

Fix typo in readme

Display correct version

Fix numpy array typing to np.ndarray

Add function docs

Fix docstring of algorithms

Update NNablaRL to nnablaRL

Fix typo seemless -> seamless

Fix build badge URL

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 14, 2021)
We are happy to announce the release of nnablaRL, a deep reinforcement learning (RL) library built on top of nnabla. Reinforcement learning is one of the cutting edge machine learning technology that achieves super human performance in the field of gaming, robotics, etc.. We hope that this new library, nnablaRL, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

Friendly API

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") # 1 dqn = A.DQN(env) # 2 dqn.train(env) # 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") config = A.DQNConfig(batch_size=100) dqn = A.DQN(env, config=config) dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see sample codes and API documents.

Many builtin algorithms

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations. Please check the sample codes and document for detail usage of each algorithm. You can find the list of implemented algorithms here.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl import nnabla_rl.algorithms as A simulator = get_simulator() # This is just an example. Assuming that simulator exists dqn = A.DQN(simulator, config=config) dqn.train_online(simulator) real_data = get_real_data() # This is also an example. Assuming that you have real robot data dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks here.

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the contribution guide for details.
Source code(tar.gz)
Source code(zip)

Owner

Sony

Sony Group Corporation

GitHub Repository

AWS SDK for Python

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

7.8k Jan 08, 2023

Exports saved posts and comments on Reddit to a csv file.

reddit-saved-to-csv Exports saved posts and comments on Reddit to a csv file. Columns: ID, Name, Subreddit, Type, URL, NoSFW ID: Starts from 1 and inc

70 Jan 02, 2023

Pydf: A modular Telegram Bot which provides Pdf Tools using PyPdf2

pyDF-Bot 🌍 Pydf - Pyrogram Document File Bot, a modular Telegram Bot which prov

2 Feb 18, 2022

un outil pour bypasser les code d'états HTTP négatif coté client ( 4xx )

4xxBypasser un outil pour bypasser les code d'états HTTP négatif coté client ( 4xx ) Liscence : MIT license Creator Installation : git clone https://g

21 Dec 25, 2022

A small repository with convenience functions for working with the Notion API.

Welcome! Within this respository are a few convenience functions to assist with the pulling and pushing of data from the Notion API.

10 Jul 09, 2022

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour. Currently compiles to URCL with plans to add additional formats in the future.

3 Nov 03, 2022

An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

URL Uploader Bot An Open Source ALL-In-One Telegram RoBot, that can do lot of things. My Features Installation The Easy Way You can also tap the Deplo

1 Oct 23, 2021

“ Hey there 👋 I'm Daisy „ AI based Advanced Group Management Bot Suit For All Your Needs ❤️.. Source Code of @Daisyxbot

Project still under heavy development Everything will be changed in the release “ Hey there 👋 I'm Daisy „ AI based Advanced telegram Group Management

43 Nov 12, 2022

数字货币BTC量化交易系统-实盘行情服务器,虚拟币自动炒币-火币API-币安交易所-量化交易-网格策略。趋势跟踪策略，最简源码,可在线回测,一键部署,可定制的比特币量化交易框架,3年实盘检验！

huobi_intf 提供火币网的实时行情服务器(支持火币网所有交易对的实时行情)，自带API缓存，可用于实盘交易和模拟回测。行情数据，是一切量化交易的基础，可以获取1min、60min、4hour、1day等数据。数据能进行缓存，可以在多个币种，多个时间段查询的时候，查询速度依然很快。服务框架

258 Sep 20, 2021

Bot facebook

botfb Bot facebook Login via cookies cara install $pkg update && pkg upgrade $pkg install git python $git clone https://github.com/Ainx-BOT/botfb $cd

12 Dec 18, 2022

Drop-in Replacement of pychallonge

pychal Pychal is a drop-in replacement of pychallonge with some extra features and support for new Python versions. Pychal provides python bindings fo

29 Nov 28, 2022

Upvotes and karma for Discord: Heart 💗 or Crush 💔 a comment to give points to an user, or Star ⭐ it to add it to the Best Of!

🤖 Reto Reto is a community-oriented Discord bot, featuring a karma system, a way to reward the best comments, leaderboards, and so much more! React t

3 May 07, 2022

Compares and analyzes GCP IAM roles.

gcp-iam-analyzer I wrote this to help in my day to day working in GCP. A lot of the time I am doing role comparisons to see which role has more permis

37 Dec 28, 2022

⚔️ Fastest tibia bot API

📝 Description tibia bot api using python ⌨ Development ⚙ Running the app python bot.py ✅ ROADMAP Add confidence to floor level to have more accuracy

133 Dec 28, 2022

buys ethereum based on graphics card moving average price on ebay

ebay_trades buys ethereum based on graphics card moving average price on ebay Built as a meme, this application will scrape the first 3 pages of ebay

41 Jan 05, 2023

This repository contains ready to deploy automations on AWS

aws-automation-plugins This repository contains ready to deploy automations on AWS. How-To All projects in this repository contain a deploy.sh file wh

8 Sep 20, 2022

This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners and alerts system.

Python Trading Bot w/ Thinkorswim Description This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners

201 Jan 03, 2023

Deep reinforcement learning library built on top of Neural Network Libraries

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

Installation

Enabling GPU accelaration (Optional)

Features

Friendly API

Many builtin algorithms

Seemless switching of online and offline training

Getting started

Documentation

Contribution guide

License

Comments

visualizing influence metrics

sample animation

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)

v0.11.0(Mar 17, 2022)

v0.10.0(Oct 20, 2021)

v0.9.0(Jun 14, 2021)

Getting started

Owner

Sony

AWS SDK for Python

Exports saved posts and comments on Reddit to a csv file.

Pydf: A modular Telegram Bot which provides Pdf Tools using PyPdf2

un outil pour bypasser les code d'états HTTP négatif coté client ( 4xx )

A small repository with convenience functions for working with the Notion API.

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

“ Hey there 👋 I'm Daisy „ AI based Advanced Group Management Bot Suit For All Your Needs ❤️.. Source Code of @Daisyxbot

数字货币BTC量化交易系统-实盘行情服务器,虚拟币自动炒币-火币API-币安交易所-量化交易-网格策略。趋势跟踪策略，最简源码,可在线回测,一键部署,可定制的比特币量化交易框架,3年实盘检验！

Bot facebook

Drop-in Replacement of pychallonge

Upvotes and karma for Discord: Heart 💗 or Crush 💔 a comment to give points to an user, or Star ⭐ it to add it to the Best Of!

Compares and analyzes GCP IAM roles.

⚔️ Fastest tibia bot API

buys ethereum based on graphics card moving average price on ebay

This repository contains ready to deploy automations on AWS

This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners and alerts system.

Sentiment Analysis web app using Streamlit - American Airlines Tweets

Telegram Bot to check covid vaccine slot availability on CoWin site

Hydrathallies'in istegi uzerine yapildi :)