A visualisation tool for Deep Reinforcement Learning

Related tags

Deep Learningdrlvis
Overview

DRLVIS - Visualising Deep Reinforcement Learning


Created by Marios Sirtmatsis with the support of Alex Bäuerle.

DRLVis is an application used for visualising deep reinforcement learning. The goal is to enable developers to get a further understanding of broadly used algorithms across the deep reinforcement learning landscape. Also DRLVis shall provide a tool for researchers and developers to help them understand errors in their implemented algorithms.

Installation

  1. Install the drlvis pip package by using the following command pip install -e drlvis from the directory above the drlvis directory
  2. After that simply run drlvis --logdir @PATH_TO_LOGDIR
  3. Open your browser on http://localhost:8000

Implementation

Architecture

The application is split into a backend and a fronted, where the backend does most of the data preprocessing. The frontend provides meaningful visualisations for further understanding of what the agent is doing, how rewards, weights and actions develop over time and how confident the agent is in selecting its actions.

Workflow for using DRLVis

  1. Train agent and log data
  2. Run drlvis
  3. Interpret meaningful visualisations in your browser

Logging

Logging for the use of drlvis is done by logger.py. The file contains a documentation on which values should be passed for logging. Thlogger.py contains an individual function for every loggable value/values. Some (the most important) of these functions are:


def create_logger(logdir)

The create_logger() function has to be used for initializing the logger and specifying the target destination of the logging directory. It is always important, that the logdir either does not exist yet or is an empty directory.


def log_episode_return(episode_return, episode_count)

With log_episode_return() one is able to log the accumulated reward per episode, with the step being the curresponding current episode count.


def log_action_divergence(action_probs, action_probs_old, episode_count, apply_softmax )

With log_action_divergence() one can calculate the divergence between actions in the current episode and actions in the last episode. Therefore the action_probabilities for each observation per timestep in an episode has to be collected. In the end of an episode this collection of action probabilites and the collection from the episode before can be passed to the log_action_divergence() method, which then calculates the kl divergence between action probabilities of the last episode and the current episode. Example code snippet with a model with softmax activation in the last layer:


def log_frame(frame, episode_count, step)

Using log_frame() one can log the frame which is currently being observed, or which corresponds with the current timestep. The episode count is the current episode and the step is the timestep within the episode on which the frame is being observed or corresponds with.


from drlvis import logger
import numpy as np

probs_curr = []

for episode in range(episode_range):

    for timestep in range(optional_timestep_range):
    
        if end_of_current_episode: #done in openai gym
            if episode >= 1:
                logger.log_action_divergence(probs_old, probs_curr, episode)
            probs_old = probs_curr

        probs_curr.append(model(observation[np.newaxis,:]))

def log_action_probs(predictions, episode_count, step, apply_softmax)

One can use log_action_probs() for logging the predictions of ones model for the currently observed timestep in an episode. If the model does not output probabilites, one can set apply_softmax to True for creating probabilities based on predictions.


def log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, state_meanings, apply_softmax)

The log_experiment_random_states()function takes a highdimensional array containing randomly generated states in bounds of the environments capabilities. (obs_min, obs_max) It also needs the episode in which a random states experiment shall be performed. The function then reduces the dimensions to two dimensions with UMAP for visualisation purposes. The state meanings can be passed for easier environments to reflect what the different states mean. A random state experiment itself is just a method to evaluate the agents confidence in selecting certain actions for randomly generated states. Example code snippet:

from drlvis import logger
import numpy as np

def random_states_experiment(model, episode_num):
   
    obs_space = env.observation_space
    obs_min = obs_space.low
    obs_max = obs_space.high


    num_samples = 10000 # can be an arbitrary number
    random_state_samples = np.random.uniform(
        low=obs_min, high=obs_max, size=(num_samples, len(obs_min)))

    predicted_dists = model(random_state_samples)
   
    logger.log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, [])

def log_action_distribution(actions, episode_count)

The log_action_distribution() function calculates the distribution of actions in the specified episode. Therefore one solely has to pass the actions, which where selected in the current episode episode_count


def log_weights(weight_tensor, step, episode_count)

With log_weights()one can log the weights of the last layer of ones model in a given timestep in an episode. This can be done as follows (model is keras model but not of major importance):

from drlvis import logger

weights = agent.model.weights[-2].numpy()
logger.log_weights(weight_tensor=weights, step=timestep ,episode_count=episode)

Examples

Examples on how to use the logger functions in real DRL implementations can be found in the examples folder that contains simple cartpole implementation in dqn_cartpole.ipynb and a more complex DQN implementation for playing Atari Breakout in dqn/.

Bachelor Thesis

For further information on how to use DRLVis and details about the application, I refer to my bachelor thesis located at documents/bachelor_thesis_visdrl.pdf.

License

MIT

Owner
Marios Sirtmatsis
Marios Sirtmatsis
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
Only valid pull requests will be allowed. Use python only and readme changes will not be accepted.

❌ This repo is excluded from hacktoberfest This repo is for python beginners and contains lot of beginner python projects for practice. You can also s

Prajjwal Pathak 50 Dec 28, 2022
A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines Understanding the results of deep neural networks is

Johan van den Heuvel 2 Dec 13, 2021
An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

fasterGICP This package is an improvement of fast_gicp Please cite our paper if possible. W. Jikai, M. Xu, F. Farzin, D. Dai and Z. Chen, "FasterGICP:

79 Dec 31, 2022
Real life contra a deep learning project built using mediapipe and openc

real-life-contra Description A python script that translates the body movement into in game control. Welcome to all new real life contra a deep learni

Programminghut 7 Jan 26, 2022
Subpopulation detection in high-dimensional single-cell data

PhenoGraph for Python3 PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") repr

Dana Pe'er Lab 42 Sep 05, 2022
An Approach to Explore Logistic Regression Models

User-centered Regression An Approach to Explore Logistic Regression Models This tool applies the potential of Attribute-RadViz in identifying correlat

0 Nov 12, 2021
Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection (NimPme) The official implementation of Novel Instances Mining with

12 Sep 08, 2022
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023
[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

RetrievalFuse Paper | Project Page | Video RetrievalFuse: Neural 3D Scene Reconstruction with a Database Yawar Siddiqui, Justus Thies, Fangchang Ma, Q

Yawar Nihal Siddiqui 75 Dec 22, 2022
FluidNet re-written with ATen tensor lib

fluidnet_cxx: Accelerating Fluid Simulation with Convolutional Neural Networks. A PyTorch/ATen Implementation. This repository is based on the paper,

JoliBrain 50 Jun 07, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 02, 2023
Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

CompVis Heidelberg 293 Dec 22, 2022
The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Wenhao Wang 89 Jan 02, 2023
Notspot robot simulation - Python version

Notspot robot simulation - Python version This repository contains all the files and code needed to simulate the notspot quadrupedal robot using Gazeb

50 Sep 26, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022