A visualisation tool for Deep Reinforcement Learning

Related tags

Deep Learningdrlvis
Overview

DRLVIS - Visualising Deep Reinforcement Learning


Created by Marios Sirtmatsis with the support of Alex Bäuerle.

DRLVis is an application used for visualising deep reinforcement learning. The goal is to enable developers to get a further understanding of broadly used algorithms across the deep reinforcement learning landscape. Also DRLVis shall provide a tool for researchers and developers to help them understand errors in their implemented algorithms.

Installation

  1. Install the drlvis pip package by using the following command pip install -e drlvis from the directory above the drlvis directory
  2. After that simply run drlvis --logdir @PATH_TO_LOGDIR
  3. Open your browser on http://localhost:8000

Implementation

Architecture

The application is split into a backend and a fronted, where the backend does most of the data preprocessing. The frontend provides meaningful visualisations for further understanding of what the agent is doing, how rewards, weights and actions develop over time and how confident the agent is in selecting its actions.

Workflow for using DRLVis

  1. Train agent and log data
  2. Run drlvis
  3. Interpret meaningful visualisations in your browser

Logging

Logging for the use of drlvis is done by logger.py. The file contains a documentation on which values should be passed for logging. Thlogger.py contains an individual function for every loggable value/values. Some (the most important) of these functions are:


def create_logger(logdir)

The create_logger() function has to be used for initializing the logger and specifying the target destination of the logging directory. It is always important, that the logdir either does not exist yet or is an empty directory.


def log_episode_return(episode_return, episode_count)

With log_episode_return() one is able to log the accumulated reward per episode, with the step being the curresponding current episode count.


def log_action_divergence(action_probs, action_probs_old, episode_count, apply_softmax )

With log_action_divergence() one can calculate the divergence between actions in the current episode and actions in the last episode. Therefore the action_probabilities for each observation per timestep in an episode has to be collected. In the end of an episode this collection of action probabilites and the collection from the episode before can be passed to the log_action_divergence() method, which then calculates the kl divergence between action probabilities of the last episode and the current episode. Example code snippet with a model with softmax activation in the last layer:


def log_frame(frame, episode_count, step)

Using log_frame() one can log the frame which is currently being observed, or which corresponds with the current timestep. The episode count is the current episode and the step is the timestep within the episode on which the frame is being observed or corresponds with.


from drlvis import logger
import numpy as np

probs_curr = []

for episode in range(episode_range):

    for timestep in range(optional_timestep_range):
    
        if end_of_current_episode: #done in openai gym
            if episode >= 1:
                logger.log_action_divergence(probs_old, probs_curr, episode)
            probs_old = probs_curr

        probs_curr.append(model(observation[np.newaxis,:]))

def log_action_probs(predictions, episode_count, step, apply_softmax)

One can use log_action_probs() for logging the predictions of ones model for the currently observed timestep in an episode. If the model does not output probabilites, one can set apply_softmax to True for creating probabilities based on predictions.


def log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, state_meanings, apply_softmax)

The log_experiment_random_states()function takes a highdimensional array containing randomly generated states in bounds of the environments capabilities. (obs_min, obs_max) It also needs the episode in which a random states experiment shall be performed. The function then reduces the dimensions to two dimensions with UMAP for visualisation purposes. The state meanings can be passed for easier environments to reflect what the different states mean. A random state experiment itself is just a method to evaluate the agents confidence in selecting certain actions for randomly generated states. Example code snippet:

from drlvis import logger
import numpy as np

def random_states_experiment(model, episode_num):
   
    obs_space = env.observation_space
    obs_min = obs_space.low
    obs_max = obs_space.high


    num_samples = 10000 # can be an arbitrary number
    random_state_samples = np.random.uniform(
        low=obs_min, high=obs_max, size=(num_samples, len(obs_min)))

    predicted_dists = model(random_state_samples)
   
    logger.log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, [])

def log_action_distribution(actions, episode_count)

The log_action_distribution() function calculates the distribution of actions in the specified episode. Therefore one solely has to pass the actions, which where selected in the current episode episode_count


def log_weights(weight_tensor, step, episode_count)

With log_weights()one can log the weights of the last layer of ones model in a given timestep in an episode. This can be done as follows (model is keras model but not of major importance):

from drlvis import logger

weights = agent.model.weights[-2].numpy()
logger.log_weights(weight_tensor=weights, step=timestep ,episode_count=episode)

Examples

Examples on how to use the logger functions in real DRL implementations can be found in the examples folder that contains simple cartpole implementation in dqn_cartpole.ipynb and a more complex DQN implementation for playing Atari Breakout in dqn/.

Bachelor Thesis

For further information on how to use DRLVis and details about the application, I refer to my bachelor thesis located at documents/bachelor_thesis_visdrl.pdf.

License

MIT

Owner
Marios Sirtmatsis
Marios Sirtmatsis
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 36 Dec 06, 2022
A Deep Learning based project for creating line art portraits.

ArtLine The main aim of the project is to create amazing line art portraits. Sounds Intresting,let's get to the pictures!! Model-(Smooth) Model-(Quali

Vijish Madhavan 3.3k Jan 07, 2023
Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

1 Jun 02, 2022
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ( 7 Jan 03, 2023

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 09, 2022
DIRL: Domain-Invariant Representation Learning

DIRL: Domain-Invariant Representation Learning Domain-Invariant Representation Learning (DIRL) is a novel algorithm that semantically aligns both the

Ajay Tanwani 30 Nov 07, 2022
Training DiffWave using variational method from Variational Diffusion Models.

Variational DiffWave Training DiffWave using variational method from Variational Diffusion Models. Quick Start python train_distributed.py discrete_10

Chin-Yun Yu 26 Dec 13, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Keras当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和fa

Bubbliiiing 31 Nov 15, 2022
Diffusion Normalizing Flow (DiffFlow) Neurips2021

Diffusion Normalizing Flow (DiffFlow) Reproduce setup environment The repo heavily depends on jam, a personal toolbox developed by Qsh.zh. The API may

76 Jan 01, 2023
PyTorch implementation of Munchausen Reinforcement Learning based on DQN and SAC. Handles discrete and continuous action spaces

Exploring Munchausen Reinforcement Learning This is the project repository of my team in the "Advanced Deep Learning for Robotics" course at TUM. Our

Mohamed Amine Ketata 10 Mar 10, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 05, 2023
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 74 Dec 07, 2022
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

565 Jan 05, 2023
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 03, 2023
Code from PropMix, accepted at BMVC'21

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels This repository is the official implementation of Hard Sample Fil

6 Dec 21, 2022
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 04, 2022
Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

Chao-Yuan Wu 479 Dec 26, 2022