YouRefIt: Embodied Reference Understanding with Language and Gesture

Last update: Jul 11, 2022

Related tags

Deep Learning YouRefIt_ERU

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

by Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu and Siyuan Huang

The IEEE International Conference on Computer Vision (ICCV), 2021

Introduction

We study the machine's understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. To tackle this problem, we introduce YouRefIt, a new crowd-sourced, real-world dataset of embodied reference.

For more details, please refer to our paper.

Checklist

Image ERU
Video ERU

Installation

The code was tested with the following environment: Ubuntu 18.04/20.04, python 3.7/3.8, pytorch 1.9.1. Run

    git clone https://github.com/yixchen/YouRefIt_ERU
    pip install -r requirements.txt

Dataset

Download the YouRefIt dataset from Dataset Request Page and put under ./ln_data

Model weights

Yolov3: download the pretrained model and place the file in ./saved_models by
```
sh saved_models/yolov3_weights.sh
```
More pretrained models are availble Google drive, and should also be placed in ./saved_models.

Make sure to put the files in the following structure:

|-- ROOT
|	|-- ln_data
|		|-- yourefit
|			|-- images
|			|-- paf
|			|-- saliency
|	|-- saved_modeks
|		|-- final_model_full.tar
|		|-- final_resc.tar

Training

Train the model, run the code under main folder.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id

Evaluation

Evaluate the model, run the code under main folder. Using flag --test to access test mode.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
 --resume saved_models/model.pth.tar \
 --test

Evaluate Image ERU on our released model

Evaluate our full model with PAF and saliency feature, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_model_full.tar --use_paf --use_sal --large --test

Evaluate baseline model that only takes images as input, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_resc.tar --large --test

Evalute the inference results on test set on different IOU levels by changing the path accordingly,

 python evaluate_results.py

Citation

@inProceedings{chen2021yourefit,
 title={YouRefIt: Embodied Reference Understanding with Language and Gesture},
 author = {Chen, Yixin and Li, Qing and Kong, Deqian and Kei, Yik Lun and Zhu, Song-Chun and Gao, Tao and Zhu, Yixin and Huang, Siyuan},
 booktitle={The IEEE International Conference on Computer Vision (ICCV),
 year={2021}
 }

Acknowledgement

Our code is built on ReSC and we thank the authors for their hard work.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Related tags

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

Introduction

Checklist

Installation

Dataset

Model weights

Training

Evaluation

Evaluate Image ERU on our released model

Citation

Acknowledgement

Owner

The repository contain code for building compiler using puthon.

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

A repo for Causal Imitation Learning under Temporally Correlated Noise

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

A project studying the influence of communication in multi-objective normal-form games

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.

Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Assessing syntactic abilities of BERT

Use unsupervised and supervised learning to predict stocks

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

Using deep actor-critic model to learn best strategies in pair trading

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Distributing reference energies for SMIRNOFF implementations