Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Last update: Dec 21, 2022

Overview

Recurrent VLN-BERT

Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

[Paper & Appendices | GitHub]

Prerequisites

Installation

Install the Matterport3D Simulator. Please find the versions of packages in our environment here.

Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.

Data Preparation

Please follow the instructions below to prepare the data in directories:

MP3D navigability graphs: connectivity
- Download the connectivity maps [23.8MB].
R2R data: data
- Download the R2R data [5.8MB].
Augmented data: data/prevalent
- Download the collected triplets in PREVALENT [1.5GB] (pre-processed for easy use).
MP3D image features: img_features
- Download the Scene features [4.2GB] (ResNet-152-Places365).

Initial OSCAR and PREVALENT weights

Please refer to vlnbert_init.py to set up the directories.

Pre-trained OSCAR weights
- Download the base-no-labels following this guide.
Pre-trained PREVALENT weights
- Download the pytorch_model.bin from here.

Trained Network Weights

Recurrent-VLN-BERT: snap
- Download the trained network weights [2.5GB] for our OSCAR-based and PREVALENT-based models.

R2R Navigation

Please read Peter Anderson's VLN paper for the R2R Navigation task.

Reproduce Testing Results

To replicate the performance reported in our paper, load the trained network weights and run validation:

bash run/test_agent.bash

You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert (oscar or prevalent) and load (trained model paths).

Training

Navigator

To train the network from scratch, simply run:

bash run/train_agent.bash

The trained Navigator will be saved under snap/.

Citation

If you use or discuss our Recurrent VLN-BERT, please cite our paper:

@article{hong2020recurrent,
  title={A Recurrent Vision-and-Language BERT for Navigation},
  author={Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
  journal={arXiv preprint arXiv:2011.13922},
  year={2020}
}

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Related tags

Overview

Recurrent VLN-BERT

Prerequisites

Installation

Data Preparation

Initial OSCAR and PREVALENT weights

Trained Network Weights

R2R Navigation

Reproduce Testing Results

Training

Navigator

Citation

Owner

YicongHong

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

Plugin repository for Macast

Main repository for the chatbot Bobotinho.

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

Searching keywords in PDF file folders

Summarization module based on KoBART

A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Maha is a text processing library specially developed to deal with Arabic text.

A python package for deep multilingual punctuation prediction.

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

A paper list of pre-trained language models (PLMs).

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

CoSENT、STS、SentenceBERT

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents