Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Last update: Nov 21, 2022

Overview

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Overall pipeline of OCN.

Paper Link: [arXiv] [AAAI official paper]

If you find our work or the codebase inspiring and useful to your research, please cite

@article{yuan2022OCN_HOI,
  title={Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics},
  author={Yuan, Hangjie and Wang, Mang and Ni, Dong and Xu, Liangpeng},
  journal={arXiv preprint arXiv:2202.00259},
  year={2022}
}

Dataset preparation

1. HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

2. V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Dependencies and Training

To simplify the steps, we combine the installation of externel dependencies and training into one '.sh' file. You can directly run the codes after rightly preparing the dataset.

# Training on HICO-DET
bash train_hico.sh
# Training on V-COCO
bash train_vcoco.sh

Note that you can refer to the publicly available codebase for the preparation of two datasets.

Pre-trained parameters

OCN uses COCO pretrained models for fair comparisons with previous methods. The pretrained models can be downloaded from DETR repository.

For HICO-DET, you can convert the pre-trained parameters with the following command.

python convert_parameters.py \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE

For V-COCO, you can convert the pre-trained parameters with the following command.

python convert_parameters.py \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE \
        --dataset vcoco \

Evaluation

The mAP on HICO-DET under the Full set, Rare set and Non-Rare Set will be reported during the training process. Or you can evaluate the performance using commands below:

python main.py \
    --pretrained /PATH/TO/PRETRAINED_MODEL \
    --output_dir /PATH/TO/OUTPUT \
    --hoi \
    --dataset_file hico \
    --hoi_path /PATH/TO/data/hico_20160224_det \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet101 \
    --num_workers 4 \
    --batch_size 4 \
    --exponential_hyper 1 \
    --exponential_loss \
    --semantic_similar_coef 1 \
    --verb_loss_type focal \
    --semantic_similar \
    --OCN \
    --eval \

The results for the official evaluation of V-COCO must be obtained by the generated pickle file of detection results.

python generate_vcoco_official.py \
        --param_path /PATH/TO/CHECKPOINT \
        --save_path /PATH/TO/SAVE/vcoco.pickle \
        --hoi_path /PATH/TO/VCOCO/data/v-coco \
        --batch_size 4 \
        --OCN \

Then you should run following codes after modifying the path to get the final performance:

python datasets/vsrl_eval.py

Results

Below we present the results and links for downloading corresponding parameters and logs: (The checkpoints can produce higher results than what are reported in the paper.) We will soon update this table.

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Related tags

Overview

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Dataset preparation

1. HICO-DET

2. V-COCO

Dependencies and Training

Pre-trained parameters

Evaluation

Results

Owner

The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Depression Asisstant GDSC Challenge Solution

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Face and other object detection using OpenCV and ML Yolo

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

Add-on for importing and auto setup of character creator 3 character exports.

This is the code of paper ``Contrastive Coding for Active Learning under Class Distribution Mismatch'' with python.

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

a basic code repository for basic task in CV(classification,detection,segmentation)

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

CIFAR-10 Photo Classification

Code for "Adversarial attack by dropping information." (ICCV 2021)

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

PyTorch implementation of CVPR'18 - Perturbative Neural Networks

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Related tags

Overview

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Dataset preparation

1. HICO-DET

2. V-COCO

Dependencies and Training

Pre-trained parameters

Evaluation

Results

Owner

The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Depression Asisstant GDSC Challenge Solution

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Face and other object detection using OpenCV and ML Yolo

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

Add-on for importing and auto setup of character creator 3 character exports.

This is the code of paper ``Contrastive Coding for Active Learning under Class Distribution Mismatch'' with python.

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

a basic code repository for basic task in CV(classification,detection,segmentation)

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

CIFAR-10 Photo Classification

Code for "Adversarial attack by dropping information." (ICCV 2021)

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

PyTorch implementation of CVPR'18 - Perturbative Neural Networks

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队