Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Overview

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

[Project website] [Paper]

This project is a PyTorch implementation of Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation, published in CoRL 2021.

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill the state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We validate our proposed approach on three manipulation tasks in obstructed environments and show its high sample-efficiency, outperforming state-of-the-art algorithms for visual policy learning.

Prerequisites

Installation

  1. Install Mujoco 2.0 and add the following environment variables into ~/.bashrc or ~/.zshrc.
# Download mujoco 2.0
$ wget https://www.roboti.us/download/mujoco200_linux.zip -O mujoco.zip
$ unzip mujoco.zip -d ~/.mujoco
$ mv ~/.mujoco/mujoco200_linux ~/.mujoco/mujoco200

# Copy mujoco license key `mjkey.txt` to `~/.mujoco`

# Add mujoco to LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin

# For GPU rendering (replace 418 with your nvidia driver version)
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-418

# Only for a headless server
$ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-418/libGL.so
  1. Download this repository and install python dependencies
# Install system packages
sudo apt-get install libgl1-mesa-dev libgl1-mesa-glx libosmesa6-dev patchelf libopenmpi-dev libglew-dev python3-pip python3-numpy python3-scipy

# Create out/ folder for saving/loading RL checkpoints
cd mopa-pd
mkdir out

# Create checkpoints/ folder for saving/loading BC-Visual checkpoints
mkdir checkpoints

# Install required python packages in your new env
pip install -r requirements.txt
  1. Install ompl
# Linux
sudo apt install libyaml-cpp-dev
sh ./scripts/misc/installEigen.sh #from the home directory # install Eigen
sudo apt-get install libboost-all-dev # install Boost C++ for ompl

# Mac OS
brew install libyaml yaml-cpp
brew install eigen

# Build ompl
git clone [email protected]:ompl/ompl.git ../ompl
cd ../ompl
cmake .
sudo make install

# if ompl-x.x (x.x is the version) is installed in /usr/local/include, you need to rename it to ompl
mv /usr/local/include/ompl-x.x /usr/local/include/ompl
  1. Build motion planner python wrapper
cd ./mopa-pd/motion_planners
python setup.py build_ext --inplace
  1. Configure wandb for tracking experiments (optional)
  • Sign up for a free account at https://app.wandb.ai/login?signup=true.
  • Open this file: config/__init__.py
  • Set wandb argument to True.
  • Add your username to the entity argument.
  • Add your project name to the project argument.
  1. Servers without a monitor (optional)

You may use the following code to create a virtual monitor for rendering.

# Run the next line for Ubuntu
$ sudo apt-get install xserver-xorg libglu1-mesa-dev freeglut3-dev mesa-common-dev libxmu-dev libxi-dev

# Configure nvidia-x
$ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

# Launch a virtual display
$ sudo /usr/bin/X :1 &

# Run a command with DISPLAY=:1
DISPLAY=:1 <command>

Available environments

SawyerPushObstacle-v0 SawyerLiftObstacle-v0 SawyerAssemblyObstacle-v0
Sawyer Push Sawyer Lift Sawyer Assembly

How to run experiments

Launch a virtual display (only for a headless server)

sudo /usr/bin/X :1 &

MoPA-RL

# train MoPA-RL policy
sh ./scripts/3d/assembly/mopa.sh 0 1234
sh ./scripts/3d/lift/mopa.sh 0 1234
sh ./scripts/3d/push/mopa.sh 0 1234

# evaluate MoPA-RL policy
sh ./scripts/3d/assembly/mopa_eval.sh 0 1234
sh ./scripts/3d/lift/mopa_eval.sh 0 1234
sh ./scripts/3d/push/mopa_eval.sh 0 1234

# generate MoPA-RL data for BC-Visual using trained MoPA-RL's checkpoint
sh ./scripts/3d/assembly/run_multiple_sh.sh
sh ./scripts/3d/lift/run_multiple_sh.sh
sh ./scripts/3d/push/run_multiple_sh.sh

BC-Visual

# pre-process MoPA-RL data
python util/state_img_preprocessing.py

cd rl # must be inside rl folder to execute the following commands

# bc_visual_args.py is the config for training and evaluating BC-Visual

# train BC-Visual 
python behavioral_cloning_visual.py

# evaluate BC-Visual
python evaluate_bc_visual.py

Baselines and Ours

  • Sawyer Push
###### Training
sh ./scripts/3d/push/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/push/bcrl_mopa_sota.sh 0 1234 # CoL 
sh ./scripts/3d/push/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/push/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/push/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/push/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/push/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/push/mopa_asym_eval.sh 0 1234  # MoPA Asym. SAC
sh ./scripts/3d/push/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC
  • Sawyer Lift
###### Training
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/lift/bcrl_mopa_sota.sh 0 1234 # CoL
sh ./scripts/3d/lift/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/lift/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/lift/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/lift/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/lift/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/lift/mopa_asym_eval.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/lift/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC
  • Sawyer Assembly
###### Training
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/assembly/bcrl_mopa_sota.sh 0 1234 # CoL
sh ./scripts/3d/assembly/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/assembly/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/assembly/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/assembly/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/assembly/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/assembly/mopa_asym_eval.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/assembly/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC

Domain Randomization

To run experiments with domain randomized simulation, the following parameters can be set in config:

  • dr: set to True to train the model with domain randomization
  • dr_params_set: choose as per the training environment - ["sawyer_push, sawyer_lift, sawyer_assembly]
  • dr_eval: set to True for evaluating the domain randomization model

Directories

The structure of the repository:

  • rl: Reinforcement learning code
  • env: Environment code for simulated experiments (2D Push and all Sawyer tasks)
  • config: Configuration files
  • util: Utility code
  • motion_planners: Motion planner code from MoPA-RL
  • scripts: Scripts for all experiments

Log directories:

  • logs/rl.ENV.DATE.PREFIX.SEED:
    • cmd.sh: A command used for running a job
    • git.txt: Log gitdiff
    • prarms.json: Summary of parameters
    • video: Generated evaulation videos (every evalute_interval)
    • wandb: Training summary of W&B, like tensorboard summary
    • ckpt_*.pt: Stored checkpoints (every ckpt_interval)
    • replay_*.pt: Stored replay buffers (every ckpt_interval)

Trouble shooting

Mujoco GPU rendering

To use GPU rendering for mujoco, you need to add /usr/lib/nvidia-000 (000 should be replaced with your NVIDIA driver version) to LD_LIBRARY_PATH before installing mujoco-py. Then, during mujoco-py compilation, it will show you linuxgpuextension instead of linuxcpuextension. In Ubuntu 18.04, you may encounter an GL-related error while building mujoco-py, open venv/lib/python3.7/site-packages/mujoco_py/gl/eglshim.c and comment line 5 #include <GL/gl.h> and line 7 #include <GL/glext.h>.

Virtual display on headless machines

On servers, you don’t have a monitor. Use this to get a virtual monitor for rendering and put DISPLAY=:1 in front of a command.

# Run the next line for Ubuntu
$ sudo apt-get install xserver-xorg libglu1-mesa-dev freeglut3-dev mesa-common-dev libxmu-dev libxi-dev

# Configure nvidia-x
$ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

# Launch a virtual display
$ sudo /usr/bin/X :1 &

# Run a command with DISPLAY=:1
DISPLAY=:1 <command>

pybind11-dev not found

wget http://archive.ubuntu.com/ubuntu/pool/universe/p/pybind11/pybind11-dev_2.2.4-2_all.deb
sudo apt install ./pybind11-dev_2.2.4-2_all.deb

ERROR: GLEW initalization error: Missing GL version

This issue is most likely due to running on a headless server.

Solution 1:

sudo mkdir -p /usr/lib/nvidia-000

Then add this line to ~/.bashrc file:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-000

Solution 2:

1. First import and call mujocopy_render_hack in main.py
2. Follow the instructions in "Virtual display on headless machines" section
3. When running a script, remember to add DISPLAY:=1 <command> 

/usr/bin/ld: cannot find -lGL

Source: https://stackoverflow.com/questions/33447653/usr-bin-ld-cannot-find-lgl-ubuntu-14-04

sudo rm /usr/lib/x86_64-linux-gnu/libGL.so 
sudo ln -s /usr/lib/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so 

References

Citation

If you find this useful, please cite

@inproceedings{liu2021mopa,
  title={Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation},
  author={I-Chun Arthur Liu and Shagun Uppal and Gaurav S. Sukhatme and Joseph J. Lim and Peter Englert and Youngwoon Lee},
  booktitle={Conference on Robot Learning},
  year={2021}
}

Authors

I-Chun (Arthur) Liu*, Shagun Uppal*, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert, and Youngwoon Lee at USC CLVR and USC RESL (*Equal contribution)

Owner
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
Learning and Reasoning for Artificial Intelligence, especially focused on perception and action. Led by Professor Joseph J. Lim @ USC
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and

TuZheng 405 Jan 04, 2023
Drone Task1 - Drone Task1 With Python

Drone_Task1 Matching Results 3.mp4 1.mp4

MLV Lab (Machine Learning and Vision Lab at Korea University) 11 Nov 14, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
Tensorflow implementation of soft-attention mechanism for video caption generation.

SA-tensorflow Tensorflow implementation of soft-attention mechanism for video caption generation. An example of soft-attention mechanism. The attentio

Paul Chen 153 Nov 14, 2022
Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

Vivek Palaniappan 637 Jan 04, 2023
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Accompanying code for the paper Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Kevin Wilkinghoff 6 Dec 01, 2022
9th place solution

AllDataAreExt-Galixir-Kaggle-HPA-2021-Solution Team Members Qishen Ha is Master of Engineering from the University of Tokyo. Machine Learning Engineer

daishu 5 Nov 18, 2021
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT This is the code base for the paper DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. Code in this repository is also available

Castorini 132 Nov 14, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 08, 2022
SoK: Vehicle Orientation Representations for Deep Rotation Estimation

SoK: Vehicle Orientation Representations for Deep Rotation Estimation Raymond H. Tu, Siyuan Peng, Valdimir Leung, Richard Gao, Jerry Lan This is the o

FIRE Capital One Machine Learning of the University of Maryland 12 Oct 07, 2022
On Effective Scheduling of Model-based Reinforcement Learning

On Effective Scheduling of Model-based Reinforcement Learning Code to reproduce the experiments in On Effective Scheduling of Model-based Reinforcemen

laihang 8 Oct 07, 2022
Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

LPTN Paper | Supplementary Material | Poster High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Ji

372 Dec 26, 2022
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Motion .

ROSEFusion 🌹 This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Moti

219 Dec 27, 2022
Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Convolutional Hough Matching Networks This is the implementation of the paper "Convolutional Hough Matching Network" by J. Min and M. Cho. Implemented

Juhong Min 70 Nov 22, 2022
[ECCV 2020] XingGAN for Person Image Generation

Contents XingGAN or CrossingGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowl

Hao Tang 218 Oct 29, 2022
Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

Deep Web Scanner Dieses Script findet Websites, die per IPv4-Adresse erreichbar sind und speichert deren Metadaten. Die Ausgabe im Terminal wird nach

Alex K. 30 Nov 18, 2022