Pretraining Representations For Data-Efficient Reinforcement Learning

Related tags

Deep LearningSGI
Overview

Pretraining Representations For Data-Efficient Reinforcement Learning

Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman & Aaron Courville

This repo provides code for implementing SGI.

Install

To install the requirements, follow these steps:

# PyTorch
export LANG=C.UTF-8
# Install requirements
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

# Finally, install the project
pip install --user -e .

Usage:

The default branch for the latest and stable changes is release.

  • To run SGI:
  1. Download the DQN replay dataset from https://research.google/tools/datasets/dqn-replay/
    • Or substitute your own pre-training data! The codebase expects a series of .gz files, one each for observations, actions and terminals.
  2. To pretrain with SGI:
python -m scripts.run public=True model_folder=./ offline.runner.save_every=2500 \
    env.game=pong seed=1 offline_model_save={your model name} \
    offline.runner.epochs=10 offline.runner.dataloader.games=[Pong] \
    offline.runner.no_eval=1 \
    +offline.algo.goal_weight=1 \
    +offline.algo.inverse_model_weight=1 \
    +offline.algo.spr_weight=1 \
    +offline.algo.target_update_tau=0.01 \
    +offline.agent.model_kwargs.momentum_tau=0.01 \
    do_online=False \
    algo.batch_size=256 \
    +offline.agent.model_kwargs.noisy_nets_std=0 \
    offline.runner.dataloader.dataset_on_disk=True \
    offline.runner.dataloader.samples=1000000 \
    offline.runner.dataloader.checkpoints='{your checkpoints}' \
    offline.runner.dataloader.num_workers=2 \
    offline.runner.dataloader.data_path={your data dir} \
    offline.runner.dataloader.tmp_data_path=./ 
  1. To fine-tune with SGI:
python -m scripts.run public=True env.game=pong seed=1 num_logs=10  \
    model_load={your_model_name} model_folder=./ \
    algo.encoder_lr=0.000001 algo.q_l1_lr=0.00003 algo.clip_grad_norm=-1 algo.clip_model_grad_norm=-1

When reporting scores, we average across 10 fine-tuning seeds.

./scripts/experiments contains a number of example configurations, including for SGI-M, SGI-M/L and SGI-W, for both pre-training and fine-tuning. Each of these scripts can be launched by providing a game and seed, e.g., ./scripts/experiments/sgim_pretrain.sh pong 1. These scripts are provided primarily to illustrate the hyperparameters used for different experiments; you will likely need to modify the arguments in these scripts to point to your data and model directories.

Data for SGI-R and SGI-E is not included due to its size, but can be re-generated locally. Contact us for details.

What does each file do?

.
โ”œโ”€โ”€ scripts
โ”‚   โ”œโ”€โ”€ run.py                # The main runner script to launch jobs.
โ”‚   โ”œโ”€โ”€ config.yaml           # The hydra configuration file, listing hyperparameters and options.
|   โ””โ”€โ”€ experiments           # Configurations for various experiments done by SGI.
|   
โ”œโ”€โ”€ src                     
โ”‚   โ”œโ”€โ”€ agent.py              # Implements the Agent API for action selection 
โ”‚   โ”œโ”€โ”€ algos.py              # Distributional RL loss and optimization
โ”‚   โ”œโ”€โ”€ models.py             # Forward passes, network initialization.
โ”‚   โ”œโ”€โ”€ networks.py           # Network architecture and forward passes.
โ”‚   โ”œโ”€โ”€ offline_dataset.py    # Dataloader for offline data.
โ”‚   โ”œโ”€โ”€ gcrl.py               # Utils for SGI's goal-conditioned RL objective.
โ”‚   โ”œโ”€โ”€ rlpyt_atari_env.py    # Slightly modified Atari env from rlpyt
โ”‚   โ”œโ”€โ”€ rlpyt_utils.py        # Utility methods that we use to extend rlpyt's functionality
โ”‚   โ””โ”€โ”€ utils.py              # Command line arguments and helper functions 
โ”‚
โ””โ”€โ”€ requirements.txt          # Dependencies
Owner
Mila
Quebec Artificial Intelligence Institute
Mila
Simple, efficient and flexible vision toolbox for mxnet framework.

MXbox: Simple, efficient and flexible vision toolbox for mxnet framework. MXbox is a toolbox aiming to provide a general and simple interface for visi

Ligeng Zhu 31 Oct 19, 2019
Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022
Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Attack-Probabilistic-Models This is the source code for Adversarial Attacks on Probabilistic Autoregressive Forecasting Models. This repository contai

SRI Lab, ETH Zurich 25 Sep 14, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

Haoran Tang 0 Apr 22, 2022
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training The Unreasonable Effectiveness of

VITA 44 Dec 23, 2022
A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

sign-language-detection A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM. The project is built for a vocabular

Hashim 4 Feb 06, 2022
่ฟ™ๆ˜ฏไธ€ไธชfacenet-pytorch็š„ๅบ“๏ผŒๅฏไปฅ็”จไบŽ่ฎญ็ปƒ่‡ชๅทฑ็š„ไบบ่„ธ่ฏ†ๅˆซๆจกๅž‹ใ€‚

Facenet๏ผšไบบ่„ธ่ฏ†ๅˆซๆจกๅž‹ๅœจPytorchๅฝ“ไธญ็š„ๅฎž็Žฐ ็›ฎๅฝ• ๆ€ง่ƒฝๆƒ…ๅ†ต Performance ๆ‰€้œ€็Žฏๅขƒ Environment ๆณจๆ„ไบ‹้กน Attention ๆ–‡ไปถไธ‹่ฝฝ Download ้ข„ๆต‹ๆญฅ้ชค How2predict ่ฎญ็ปƒๆญฅ้ชค How2train ๅ‚่€ƒ่ต„ๆ–™ Reference ๆ€ง่ƒฝๆƒ…ๅ†ต ่ฎญ็ปƒๆ•ฐๆฎ

Bubbliiiing 210 Jan 06, 2023
Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation Description Our work is accpeted by AAAI 2022. Picture: We propose a domain-general

39 Dec 05, 2022
Simulation of the solar system using various nummerical methods

solar-system Simulation of the solar system using various nummerical methods Download the repo Make shure matplotlib, scipy etc. are installed execute

Caspar 7 Jul 15, 2022
Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

THUML @ Tsinghua University 229 Dec 23, 2022
Neural Turing Machines (NTM) - PyTorch Implementation

PyTorch Neural Turing Machine (NTM) PyTorch implementation of Neural Turing Machines (NTM). An NTM is a memory augumented neural network (attached to

Guy Zana 519 Dec 21, 2022
A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

The Rivet programming language 17 Dec 29, 2022
This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds

LiDARTag Overview This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds (PDF)(arXiv). This wo

University of Michigan Dynamic Legged Locomotion Robotics Lab 159 Dec 21, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022
Bayesian Inference Tools in Python

BayesPy Bayesian Inference Tools in Python Our goal is, given the discrete outcomes of events, estimate the distribution of categories. Using gradient

Max Sklar 99 Dec 14, 2022
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

2 Jan 24, 2022
moving object detection for satellite videos.

DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos Algorithm Introduction DSFNet: Dynamic and Static Fusion Net

xiaochao 39 Dec 16, 2022
Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

DTU Acoustic Technology Group 11 Dec 17, 2022
A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around Feature Store groups, queries, and other relevant artifacts.

ML Lineage Helper This library is a wrapper around the SageMaker SDK to support ease of lineage tracking across the ML lifecycle. Lineage artifacts in

AWS Samples 12 Nov 01, 2022