A visualisation tool for Deep Reinforcement Learning

Related tags

Deep Learningdrlvis
Overview

DRLVIS - Visualising Deep Reinforcement Learning


Created by Marios Sirtmatsis with the support of Alex Bäuerle.

DRLVis is an application used for visualising deep reinforcement learning. The goal is to enable developers to get a further understanding of broadly used algorithms across the deep reinforcement learning landscape. Also DRLVis shall provide a tool for researchers and developers to help them understand errors in their implemented algorithms.

Installation

  1. Install the drlvis pip package by using the following command pip install -e drlvis from the directory above the drlvis directory
  2. After that simply run drlvis --logdir @PATH_TO_LOGDIR
  3. Open your browser on http://localhost:8000

Implementation

Architecture

The application is split into a backend and a fronted, where the backend does most of the data preprocessing. The frontend provides meaningful visualisations for further understanding of what the agent is doing, how rewards, weights and actions develop over time and how confident the agent is in selecting its actions.

Workflow for using DRLVis

  1. Train agent and log data
  2. Run drlvis
  3. Interpret meaningful visualisations in your browser

Logging

Logging for the use of drlvis is done by logger.py. The file contains a documentation on which values should be passed for logging. Thlogger.py contains an individual function for every loggable value/values. Some (the most important) of these functions are:


def create_logger(logdir)

The create_logger() function has to be used for initializing the logger and specifying the target destination of the logging directory. It is always important, that the logdir either does not exist yet or is an empty directory.


def log_episode_return(episode_return, episode_count)

With log_episode_return() one is able to log the accumulated reward per episode, with the step being the curresponding current episode count.


def log_action_divergence(action_probs, action_probs_old, episode_count, apply_softmax )

With log_action_divergence() one can calculate the divergence between actions in the current episode and actions in the last episode. Therefore the action_probabilities for each observation per timestep in an episode has to be collected. In the end of an episode this collection of action probabilites and the collection from the episode before can be passed to the log_action_divergence() method, which then calculates the kl divergence between action probabilities of the last episode and the current episode. Example code snippet with a model with softmax activation in the last layer:


def log_frame(frame, episode_count, step)

Using log_frame() one can log the frame which is currently being observed, or which corresponds with the current timestep. The episode count is the current episode and the step is the timestep within the episode on which the frame is being observed or corresponds with.


from drlvis import logger
import numpy as np

probs_curr = []

for episode in range(episode_range):

    for timestep in range(optional_timestep_range):
    
        if end_of_current_episode: #done in openai gym
            if episode >= 1:
                logger.log_action_divergence(probs_old, probs_curr, episode)
            probs_old = probs_curr

        probs_curr.append(model(observation[np.newaxis,:]))

def log_action_probs(predictions, episode_count, step, apply_softmax)

One can use log_action_probs() for logging the predictions of ones model for the currently observed timestep in an episode. If the model does not output probabilites, one can set apply_softmax to True for creating probabilities based on predictions.


def log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, state_meanings, apply_softmax)

The log_experiment_random_states()function takes a highdimensional array containing randomly generated states in bounds of the environments capabilities. (obs_min, obs_max) It also needs the episode in which a random states experiment shall be performed. The function then reduces the dimensions to two dimensions with UMAP for visualisation purposes. The state meanings can be passed for easier environments to reflect what the different states mean. A random state experiment itself is just a method to evaluate the agents confidence in selecting certain actions for randomly generated states. Example code snippet:

from drlvis import logger
import numpy as np

def random_states_experiment(model, episode_num):
   
    obs_space = env.observation_space
    obs_min = obs_space.low
    obs_max = obs_space.high


    num_samples = 10000 # can be an arbitrary number
    random_state_samples = np.random.uniform(
        low=obs_min, high=obs_max, size=(num_samples, len(obs_min)))

    predicted_dists = model(random_state_samples)
   
    logger.log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, [])

def log_action_distribution(actions, episode_count)

The log_action_distribution() function calculates the distribution of actions in the specified episode. Therefore one solely has to pass the actions, which where selected in the current episode episode_count


def log_weights(weight_tensor, step, episode_count)

With log_weights()one can log the weights of the last layer of ones model in a given timestep in an episode. This can be done as follows (model is keras model but not of major importance):

from drlvis import logger

weights = agent.model.weights[-2].numpy()
logger.log_weights(weight_tensor=weights, step=timestep ,episode_count=episode)

Examples

Examples on how to use the logger functions in real DRL implementations can be found in the examples folder that contains simple cartpole implementation in dqn_cartpole.ipynb and a more complex DQN implementation for playing Atari Breakout in dqn/.

Bachelor Thesis

For further information on how to use DRLVis and details about the application, I refer to my bachelor thesis located at documents/bachelor_thesis_visdrl.pdf.

License

MIT

Owner
Marios Sirtmatsis
Marios Sirtmatsis
Implement of homography net by pytorch

HomographyNet Implement of homography net by pytorch Brief Introduction This project is based on the work Homography-Net: @article{detone2016deep, t

ronghao_CN 4 May 19, 2022
LogAvgExp - Pytorch Implementation of LogAvgExp

LogAvgExp - Pytorch Implementation of LogAvgExp for Pytorch Install $ pip instal

Phil Wang 31 Oct 14, 2022
Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Layerwise Anomaly This repository contains the source code and data for our ACL 2021 paper: "How is BERT surprised? Layerwise detection of linguistic

6 Dec 07, 2022
Anomaly Localization in Model Gradients Under Backdoor Attacks Against Federated Learning

Federated_Learning This repo provides a federated learning framework that allows to carry out backdoor attacks under varying conditions. This is a ker

Arçelik ARGE Açık Kaynak Yazılım Organizasyonu 0 Nov 30, 2021
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 03, 2023
Instance-based label smoothing for improving deep neural networks generalization and calibration

Instance-based Label Smoothing for Neural Networks Pytorch Implementation of the algorithm. This repository includes a new proposed method for instanc

Mohamed Maher 1 Aug 13, 2022
The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

Deep Levelset for Box-supervised Instance Segmentation in Aerial Images Wentong Li, Yijie Chen, Wenyu Liu, Jianke Zhu* Any questions or discussions ar

sunshine.lwt 112 Jan 05, 2023
Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

This is the code associated with the paper Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks, published at CVPR 2020.

Thomas Roddick 219 Dec 20, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

Princeton Natural Language Processing 119 Sep 28, 2022
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

Danil Provodin 2 Sep 11, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022
Pytorch implementation of Nueral Style transfer

Nueral Style Transfer Pytorch implementation of Nueral style transfer algorithm , it is used to apply artistic styles to content images . Content is t

Abhinav 9 Oct 15, 2022
Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks.

The Lottery Ticket Hypothesis for Pre-trained BERT Networks Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks. [NeurIPS

VITA 122 Dec 14, 2022
A more easy-to-use implementation of KPConv based on PyTorch.

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Zheng Qin 36 Dec 29, 2022