Code for Multimodal Neural SLAM for Interactive Instruction Following

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

The code is adapted from E.T. and most training as well as data processing files are in currently in the ET/notebooks folder and the et_train folder.

Dependency

Inherited from the E.T. repo, the package is depending on:

  • numpy
  • pandas
  • opencv-python
  • tqdm
  • vocab
  • revtok
  • numpy
  • Pillow
  • sacred
  • etaprogress
  • scikit-video
  • lmdb
  • gtimer
  • filelock
  • networkx
  • termcolor
  • torch==1.7.1
  • torchvision==0.8.2
  • tensorboardX==1.8
  • ai2thor==2.1.0
  • E.T. (https://github.com/alexpashevich/E.T.)

MaskRCNN Fine-tuning

To fine-tune the MaskRCNN module used in solving the Alfred challenge, we provide the code adapted from the official PyTorch tutorial.

Setup

We assume the environment and the code structure as in the E.T. model is set up, with this repo served as an extension. Although the fine-tuning code should be a standalone unit.

Training Data Geneation

Given a traj_data.json file (e.g., the 45K one used in E.T. joint-training here), run python -m alfred.gen.render_trajs as in E.T. to render the training inputs (raw images) and the ground truth labels (instance segmentation masks) for all the frames recorded in the traj_data.json files. Make sure the flag for generating instance level segmentation masks is set to True.

Pre-processing Instance Segmentation Masks

The rendered instance segmentation masks need to be preprocessed so that the data format is aligned with the one used in the official PyTorch tutorial. In specific, each generated mask is of a different RGB color per instance, which is mapped to the unique instance index in the frame as well as a label index for its semantic class. The mapping is constructed by looking up the traj['scene']['color_to_object_type'] in each of the json dictionaries. The code also supports the functionality to only collect training data from certain subgoal data (such as for PickupObject in Alfred). Notice that there are some bugs in the rendering process of the masks which creates some artifacts (small regions in the ground truth labels that correspond to no actual objects). This can be fixed by only selecting instance masks that are larger than certain area (e.g., > 10 as in alfred/data/maskrcnn.py).

Training

Run python -m alfred.maskrcnn.train which first loads the pre-trained model provided by E.T. and then fine-tunes it on the pre-processed data mentioned above.

Evaluation

We follow the MSCOCO evaluation protocal which is widely used for object detection and instance segmentation, which output average precision and recall at multiple scales. The evaluation function call evaluate(model, data_loader_test, device=device) in alfred/maskrcnn/train.py serves as an example.

The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation The source code of our work "SIDE: Center-based Stereo 3D Detecto

10 Dec 18, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 03, 2023
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 02, 2023
Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

Azavea 1.7k Dec 22, 2022
Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Session-aware BERT4Rec Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 shor

Jamie J. Seol 22 Dec 13, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022
Official Pytorch implementation of "DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network" (CVPR'21)

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network Pytorch implementation for our DivCo. We propose a simple ye

64 Nov 22, 2022
Torch Implementation of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"

Photo-Realistic-Super-Resoluton Torch Implementation of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network" [Paper]

Harry Yang 199 Dec 01, 2022
Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing

FGHV Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing Requirements Python 3.6 Pytorch 1.5.0 Cud

5 Jun 02, 2022
Real-time Joint Semantic Reasoning for Autonomous Driving

MultiNet MultiNet is able to jointly perform road segmentation, car detection and street classification. The model achieves real-time speed and state-

Marvin Teichmann 518 Dec 12, 2022
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish legal domain Language Model ⚖️ This repository contains the page for two main resources for the Spanish legal domain: A RoBERTa model: https:/

Plan de Tecnologías del Lenguaje - Gobierno de España 12 Nov 14, 2022
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 09, 2022
Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

English: README-EN.md VRCWatch VRCWatch は、VRChat 内のアバター向けに現在時刻を送信するためのプログラムです。 使

Kosaki Mezumona 17 Nov 30, 2022
Implementation of the GBST block from the Charformer paper, in Pytorch

Charformer - Pytorch Implementation of the GBST (gradient-based subword tokenization) module from the Charformer paper, in Pytorch. The paper proposes

Phil Wang 105 Dec 26, 2022
An Industrial Grade Federated Learning Framework

DOC | Quick Start | 中文 FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure comput

Federated AI Ecosystem 4.8k Jan 09, 2023
Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification Official Implementation for the pape

Anh M. Nguyen 36 Dec 28, 2022
A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up/down.

HandTrackingBrightnessControl A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up

Teemu Laurila 19 Feb 12, 2022
Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization".

SAPE Project page Paper Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization". Environment Cre

36 Dec 09, 2022