Decision Transformer: A brand new Offline RL Pattern

Overview

DecisionTransformer_StepbyStep

Intro

Decision Transformer: A brand new Offline RL Pattern.

这是关于NeurIPS 2021 热门论文Decision Transformer的复现。

👍 原文地址: Decision Transformer: Reinforcement Learning via Sequence Modeling

👍 官方的Git仓库: decision-transformer(official)

Decision Transformer

Decision Transformer属于Offline RL,所谓Offline RL,即从次优数据中学习策略来分配Agent,即从固定、有限的经验中产生最大有效的行为。

👀️ Motivation

DT将RL看成一个序列建模问题(Sequence Modeling Problem ),不用传统RL方法,而使用网络直接输出动作进行决策。传统RL方法存在一些问题,比如估计未来Return过程中Bootstrapping过程会导致Overestimate; 马尔可夫假设;

DT借助了Transformer的强大表征能力和时序建模能力。

  • Decision Transformer的表现达到甚至超过了目前最好的基于dynamic programming的主流方法;
  • 在一些需要long-term credit assignment的task【例如sparse reward或者delayed reward等】,Decision Transformer的表现远超过了最好的主流方法.

🚀️ DT的核心思想

image.png

Decision Transformer的核心思想; States、Actions、Returns被Fed into Modality-Specific的线性Embedding;并添加了带有时间步信息的positional episodic timestep; 这些Tokens被输入一个GPT架构,使用a causal self-attention mask来预测actions。

🎉️ DT的优势

  1. 无需Markov假设;
  2. 没有使用一个可学习的Value Function作为Training Target;
  3. 利用Transformer的特性,绕过长期信用分配进行“自举bootstrapping”的需要,避免了时序差分学习的“短视”行为;
  4. 可以通过self-attention直接执行信度分配。这与缓慢传播奖励并容易产生干扰信号的 Bellman Backup 相反,可以使 Transformer 在奖励稀少或分散注意力的情况下仍然有效地工作.

Dependencies

1. D4RL ( Dataset for Deep Data-Driven Reinforcement Learning )

2. MUJOCO 210

# 安装之前先安装absl-py和matplotlib 
pip install absl-py 
pip install matplotlib 

"""
git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
pip install -e . # 这种方法不好使 !! 
"""

#首先在https://github.com/deepmind/dm_control这个库git clone
# cd
pip install -r requirement.txt 
# 然后 
pip install matplotlib 
# 然后 https://github.com/takuseno/d3rlpy 
pip install d3rlpy 
# 然后安装mujoco 210  
# 直接安装,然后添加环境变量 
# 装完之后进d4rl文件夹下
python setup.py install 
# 成功安装 d4rl 1.1 

3. GPT-2


pip install transformers

Experiments

Group1: Decision Transformer — Hopper-v3-Medium-Dataset

参数Config

class Config:
    env = "hopper"
    dataset = "medium"
    mode = "normal" # "delayed" : all rewards moved to end of trajectory
    device = 'cuda'
    log_dir = 'TB_log/'
    record_algo = 'DT_Hopper_v1'
    test_cycles = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')

    # 模型
    model_type = "DT"
    activation_function = 'relu'

    # Scalar
    max_length = 20 # max_len # K
    pct_traj = 1.
    batch_size = 64
    embed_dim = 128
    n_layer = 3
    n_head = 1
    dropout = 0.1
    lr = 1e-4
    wd = 1e-4
    warmup_steps = 1000
    num_eval_episodes = 100
    max_iters = 50
    num_steps_per_iter = 1000

    # Bool
    log_to_tb = True

效果

image.png

Owner
Irving
Irving
Extremely easy multi instancing software for minecraft speedrunning.

Easy Multi Extremely easy multi/single instancing software for minecraft speedrunning. A couple of goals of this project: Setup multi in minutes No fi

Duncan 8 Jul 16, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

36 Oct 30, 2022
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

LEXA Benchmark Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper (Discovering and Achieving Goals via World Models

Oleg Rybkin 36 Dec 22, 2022
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

UEPNet (ICCV2021 Poster Presentation) This repository contains codes for the official implementation in PyTorch of UEPNet as described in Uniformity i

Tencent YouTu Research 15 Dec 14, 2022
Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

Deep learning based model for Cyro ET Sub-tomogram-Detection High degree of stru

Siddhant Kumar 2 Feb 04, 2022
Ratatoskr: Worcester Tech's conference scheduling system

Ratatoskr: Worcester Tech's conference scheduling system In Norse mythology, Ratatoskr is a squirrel who runs up and down the world tree Yggdrasil to

4 Dec 22, 2022
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022
COIN the currently largest dataset for comprehensive instruction video analysis.

COIN Dataset COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e

86 Dec 28, 2022
GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

GestureSSD_CBAM A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js SSD implementation is based on https://github

xue_senhua1999 2 Jan 06, 2022
Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

Nvdiffrast – Modular Primitives for High-Performance Differentiable Rendering Modular Primitives for High-Performance Differentiable Rendering Samuli

NVIDIA Research Projects 675 Jan 06, 2023
[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

Yi Wei 369 Dec 24, 2022
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
Affine / perspective transformation in Pose Estimation with Tensorflow 2

Pose Transformation Affine / Perspective transformation in Pose Estimation with Tensorflow 2 Introduction 이 repo는 pose estimation을 연구하고 개발하는 데 도움이 되기

Kim Junho 1 Dec 22, 2021
existing and custom freqtrade strategies supporting the new hyperstrategy format.

freqtrade-strategies Description Existing and self-developed strategies, rewritten to support the new HyperStrategy format from the freqtrade-develop

39 Aug 20, 2021
CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

CrossNorm (CN) and SelfNorm (SN) (Accepted at ICCV 2021) This is the official PyTorch implementation of our CNSN paper, in which we propose CrossNorm

100 Dec 28, 2022