Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

Overview

CARLA-Roach

This is the official code release of the paper
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
by Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu and Luc van Gool, accepted at ICCV 2021.

It contains the code for benchmark, off-policy data collection, on-policy data collection, RL training and IL training with DAGGER. It also contains trained models of RL experts and IL agents. The supplementary videos can be found at the paper's homepage.

Installation

Please refer to INSTALL.md for installation. We use AWS EC2, but you can also install and run all experiments on your computer or cluster.

Quick Start: Collect an expert dataset using Roach

Roach is an end-to-end trained agent that drives better and more naturally than hand-crafted CARLA experts. To collect a dataset from Roach, use run/data_collect_bc.sh and modify the following arguments:

  • save_to_wandb: set to False if you don't want to upload the dataset to W&B.
  • dataset_root: local directory for saving the dataset.
  • test_suites: default is eu_data which collects data in Town01 for the NoCrash-dense benchmark. Available configurations are found here. You can also create your own configuration.
  • n_episodes: how many episodes to collect, each episode will be saved to a separate h5 file.
  • agent/cilrs/obs_configs: observation (i.e. sensor) configuration, default is central_rgb_wide. Available configurations are found here. You can also create your own configuration.
  • inject_noise: default is True. As introduced in CILRS, triangular noise is injected to steering and throttle such that the ego-vehicle does not always follow the lane center. Very useful for imitation learning.
  • actors.hero.terminal.kwargs.max_time: Maximum duration of an episode, in seconds.
  • Early stop the episode if traffic rule is violated, such that the collected dataset is error-free.
    • actors.hero.terminal.kwargs.no_collision: default is True.
    • actors.hero.terminal.kwargs.no_run_rl: default is False.
    • actors.hero.terminal.kwargs.no_run_stop: default is False.

Benchmark

To benchmark checkpoints, use run/benchmark.sh and modify the arguments to select different settings. We recommend g4dn.xlarge with 50 GB free disk space for video recording. Use screen if you want to run it in the background

screen -L -Logfile ~/screen.log -d -m run/benchmark.sh

Trained Models

The trained models are hosted here on W&B. Given the corresponding W&B run path, our code will automatically download and load the checkpoint with the configuration yaml file.

The following checkpoints are used to produce the results reported in our paper.

  • To benchmark the Autopilot, use benchmark() with agent="roaming".
  • To benchmark the RL experts, use benchmark() with agent="ppo" and set agent.ppo.wb_run_path to one of the following.
    • iccv21-roach/trained-models/1929isj0: Roach
    • iccv21-roach/trained-models/1ch63m76: PPO+beta
    • iccv21-roach/trained-models/10pscpih: PPO+exp
  • To benchmark the IL agents, use benchmark() with agent="cilrs" and set agent.cilrs.wb_run_path to one of the following.
    • Checkpoints trained for the NoCrash benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/39o1h862: L_A(AP)
      • iccv21-roach/trained-models/v5kqxe3i: L_A
      • iccv21-roach/trained-models/t3x557tv: L_K
      • iccv21-roach/trained-models/1w888p5d: L_K+L_V
      • iccv21-roach/trained-models/2tfhqohp: L_K+L_F
      • iccv21-roach/trained-models/3vudxj38: L_K+L_V+L_F
      • iccv21-roach/trained-models/31u9tki7: L_K+L_F(c)
      • iccv21-roach/trained-models/aovrm1fs: L_K+L_V+L_F(c)
    • Checkpoints trained for the LeaderBoard benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/1myvm4mw: L_A(AP)
      • iccv21-roach/trained-models/nw226h5h: L_A
      • iccv21-roach/trained-models/12uzu2lu: L_K
      • iccv21-roach/trained-models/3ar2gyqw: L_K+L_V
      • iccv21-roach/trained-models/9rcwt5fh: L_K+L_F
      • iccv21-roach/trained-models/2qq2rmr1: L_K+L_V+L_F
      • iccv21-roach/trained-models/zwadqx9z: L_K+L_F(c)
      • iccv21-roach/trained-models/21trg553: L_K+L_V+L_F(c)

Available Test Suites

Set argument test_suites to one of the following.

  • NoCrash-busy
    • eu_test_tt: NoCrash, busy traffic, train town & train weather
    • eu_test_tn: NoCrash, busy traffic, train town & new weather
    • eu_test_nt: NoCrash, busy traffic, new town & train weather
    • eu_test_nn: NoCrash, busy traffic, new town & new weather
    • eu_test: eu_test_tt/tn/nt/nn, all 4 conditions in one file
  • NoCrash-dense
    • nocrash_dense: NoCrash, dense traffic, all 4 conditions
  • LeaderBoard:
    • lb_test_tt: LeaderBoard, busy traffic, train town & train weather
    • lb_test_tn: LeaderBoard, busy traffic, train town & new weather
    • lb_test_nt: LeaderBoard, busy traffic, new town & train weather
    • lb_test_nn: LeaderBoard, busy traffic, new town & new weather
    • lb_test: lb_test_tt/tn/nt/nn all, 4 conditions in one file
  • LeaderBoard-all
    • cc_test: LeaderBoard, busy traffic, all 76 routes, dynamic weather

Collect Datasets

We recommend g4dn.xlarge for dataset collecting. Make sure you have enough disk space attached to the instance.

Collect Off-Policy Datasets

To collect off-policy datasets, use run/data_collect_bc.sh and modify the arguments to select different settings. You can use Roach (given a checkpoint) or the Autopilot to collect off-policy datasets. In our paper, before the DAGGER training the IL agents are initialized via behavior cloning (BC) using an off-policy dataset collected in this way.

Some arguments you may want to modify:

  • Set save_to_wandb=False if you don't want to upload the dataset to W&B.
  • Select the environment for collecting data by setting the argument test_suites to one of the following
    • eu_data: NoCrash, train town & train weather. We collect n_episodes=80 for BC dataset on NoCrash, that is around 75 GB and 6 hours of data.
    • lb_data: LeaderBoard, train town & train weather. We collect n_episodes=160 for BC dataset on LeaderBoard, that is around 150 GB and 12 hours of data.
    • cc_data: CARLA Challenge, all six maps (Town1-6), dynamic weather. We collect n_episodes=240 for BC dataset on CARLA Challenge, that is around 150 GB and 18 hours of data.
  • For RL experts, the used checkpoint is set via agent.ppo.wb_run_path and agent.ppo.wb_ckpt_step.
    • agent.ppo.wb_run_path is the W&B run path where the RL training is logged and the checkpoints are saved.
    • agent.ppo.wb_ckpt_step is the step of the checkpoint you want to use. If it's an integer, the script will find the checkpoint closest to that step. If it's null, the latest checkpoint will be used.

Collect On-Policy Datasets

To collect on-policy datasets, use run/data_collect_dagger.sh and modify the arguments to select different settings. You can use Roach or the Autopilot to label on-policy (DAGGER) datasets generated by an IL agent (given a checkpoint). This is done by running the data_collect.py using an IL agent as the driver, and Roach/Autopilot as the coach. So the expert supervisions are generated and recorded on the fly.

Most things are the same as collecting off-policy BC datasets. Here are some changes:

  • Set agent.cilrs.wb_run_path to the W&B run path where the IL training is logged and the checkpoints are saved.
  • By adjusting n_episodes we make sure the size of the DAGGER dataset at each iteration to be around 20% of the BC dataset size.
    • For RL experts we use an n_episodes which is the half of n_episodes of the BC dataset.
    • For the Autopilot we use an n_episodes which is the same as n_episodes of the BC dataset.

Train RL Experts

To train RL experts, use run/train_rl.sh and modify the arguments to select different settings. We recommend to use g4dn.4xlarge for training the RL experts, you will need around 50 GB free disk space for videos and checkpoints. We train RL experts on CARLA 0.9.10.1 because 0.9.11 crashes more often for unknown reasons.

Train IL Agents

To train IL agents, use run/train_il.sh and modify the arguments to select different settings. Training IL agents does not require CARLA and it's a GPU-heavy task. Therefore, we recommend to use AWS p-instances or your cluster to run the IL training. Our implementation follows DA-RB (paper, repo), which trains a CILRS (paper, repo) agent using DAGGER.

The training starts with training the basic CILRS via behavior cloning using an off-policy dataset.

  1. Collect off-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

Then repeat the following DAGGER steps until the model achieves decent results.

  1. Collect on-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

For the BC training,the following arguments have to be set.

  • Datasets
    • dagger_datasets: a vector of strings, for BC training it should only contain the path (local or W&B) to the BC dataset.
  • Measurement vector
    • agent.cilrs.env_wrapper.kwargs.input_states can be a subset of [speed,vec,cmd]
    • speed: scalar ego_vehicle speed
    • vec: 2D vector pointing to the next GNSS waypoint
    • cmd: one-hot vector of high-level command
  • Branching
    • For 6 branches:
      • agent.cilrs.policy.kwargs.number_of_branches=6
      • agent.cilrs.training.kwargs.branch_weights=[1.0,1.0,1.0,1.0,1.0,1.0]
    • For 1 branch:
      • agent.cilrs.policy.kwargs.number_of_branches=1
      • agent.cilrs.training.kwargs.branch_weights=[1.0]
  • Action Loss
    • L1 action loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution=null
      • agent.cilrs.training.kwargs.action_kl=false
    • KL loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution="beta_shared"
      • agent.cilrs.training.kwargs.action_kl=true
  • Value Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=false
      • agent.cilrs.training.kwargs.value_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=true
      • agent.cilrs.training.kwargs.value_weight=0.001
  • Pre-trained action/value head
    • agent.cilrs.rl_run_path and agent.cilrs.rl_ckpt_step are used to initialize the IL agent's action/value heads with Roach's action/value head.
  • Feature Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=0
      • agent.cilrs.training.kwargs.features_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=256
      • agent.cilrs.training.kwargs.features_weight=0.05

During the DAGGER training, a trained IL agent will be loaded and you cannot change the configuration any more. You will have to set

  • agent.cilrs.wb_run_path: the W&B run path where the previous IL training was logged and the checkpoints are saved.
  • agent.cilrs.wb_ckpt_step: the step of the checkpoint you want to use. Leave it as null will load the latest checkpoint.
  • dagger_datasets: vector of strings, W&B run path or local path to DAGGER datasets and the BC dataset in time-reversed order, for example [PATH_DAGGER_DATA_2, PATH_DAGGER_DATA_1, PATH_DAGGER_DATA_0, BC_DATA]
  • train_epochs: optionally you can change it if you want to train for more epochs.

Citation

Please cite our work if you found it useful:

@inproceedings{zhang2021roach,
  title = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  author = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc},
  year = {2021},
}

License

This software is released under a CC-BY-NC 4.0 license, which allows personal and research use only. For a commercial license, please contact the authors. You can view a license summary here.

Portions of source code taken from external sources are annotated with links to original files and their corresponding licenses.

Acknowledgements

This work was supported by Toyota Motor Europe and was carried out at the TRACE Lab at ETH Zurich (Toyota Research on Automated Cars in Europe - Zurich).

Owner
Zhejun Zhang
PhD Candidate at CVL, ETH Zurich
Zhejun Zhang
Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

Linda MacPhee-Cobb 386 Jan 03, 2023
DFM: A Performance Baseline for Deep Feature Matching

DFM: A Performance Baseline for Deep Feature Matching Python (Pytorch) and Matlab (MatConvNet) implementations of our paper DFM: A Performance Baselin

143 Jan 02, 2023
Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

README The code is based on the ILswiss. To run the code, use python run_experiment.py --nosrun -e your YAML file -g gpu id Generally, run_experim

ApexRL 12 Mar 19, 2022
Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

CMT Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award) [Paper] [Site] Directory Struc

Zhaokai Wang 198 Dec 27, 2022
An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

Deep-motion-editing This library provides fundamental and advanced functions to work with 3D character animation in deep learning with Pytorch. The co

1.2k Dec 29, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
2021:"Bridging Global Context Interactions for High-Fidelity Image Completion"

TFill arXiv | Project This repository implements the training, testing and editing tools for "Bridging Global Context Interactions for High-Fidelity I

Chuanxia Zheng 111 Jan 08, 2023
Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

Official code of APHYNITY Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting (ICLR 2021, Oral) Yuan Yin*, Vincent Le Guen*

Yuan Yin 24 Oct 24, 2022
Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

4 Jan 07, 2022
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

MoEBERT This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022). Installation Create an

Simiao Zuo 34 Dec 24, 2022
PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

ExORL: Exploratory Data for Offline Reinforcement Learning This is an original PyTorch implementation of the ExORL framework from Don't Change the Alg

Denis Yarats 52 Jan 01, 2023
Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Yuval Rosen 3 Jul 14, 2021
Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022) Introdu

anonymous 14 Oct 27, 2022
Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Code for Simulations in the Publication Can the brain use waves to solve planning problems? Installing Required Python Packages Please use Python vers

EMD Group 2 Jul 01, 2022
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Description The source code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chin

Zhengxiang Wang 3 Jun 28, 2022
Julia package for multiway (inverse) covariance estimation.

TensorGraphicalModels TensorGraphicalModels.jl is a suite of Julia tools for estimating high-dimensional multiway (tensor-variate) covariance and inve

Wayne Wang 3 Sep 23, 2022
Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in OpenAI gym

Nikhil Barhate 104 Jan 06, 2023
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
WiFi-based Multi-task Sensing

WiFi-based Multi-task Sensing Introduction WiFi-based sensing has aroused immense attention as numerous studies have made significant advances over re

zhangx289 6 Nov 24, 2022