QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Last update: Dec 29, 2022

Related tags

Deep Learning QAHOI

Overview

QAHOI

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Requirements

PyTorch >= 1.5.1
torchvision >= 0.6.1

pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# test
python test.py

Dataset Preparation

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data folder as follows:

data
├── hico_20160224_det
|   ├── images
|   |   ├── test2015
|   |   └── train2015
|   └── annotations
|       ├── anno_list.json
|       ├── corre_hico.npy
|       ├── file_name_to_obj_cat.json
|       ├── hoi_id_to_num.json
|       ├── hoi_list_new.json
|       ├── test_hico.json
|       └── trainval_hico.json

Evaluation

Download the model to params folder.

We test the model with NVIDIA A6000 GPU, Pytorch 1.9.0, Python 3.8 and CUDA 11.2.

Model	Full (def)	Rare (def)	None-Rare (def)	Full (ko)	Rare (ko)	None-Rare (ko)	Download
Swin-Tiny	28.47	22.44	30.27	30.99	24.83	32.84	model
Swin-Base*+	33.58	25.86	35.88	35.34	27.24	37.76	model
Swin-Large*+	35.78	29.80	37.56	37.59	31.36	39.36	model

Evaluating the model by running the following command.

--eval_extra to evaluate the spatio contribution.

mAP_default.json and mAP_ko.json will save in current folder.

Swin-Tiny

python main.py --resume params/QAHOI_swin_tiny_mul3.pth --backbone swin_tiny --num_feature_levels 3 --use_nms --eval

Swin-Base*+

python main.py --resume params/QAHOI_swin_base_384_22k_mul3.pth --backbone swin_base_384 --num_feature_levels 3 --use_nms --eval

Swin-Large*+

python main.py --resume params/QAHOI_swin_large_384_22k_mul3.pth --backbone swin_large_384 --num_feature_levels 3 --use_nms --eval

Training

Download the pre-trained swin-tiny model from Swin-Transformer to params folder.

Training QAHOI with Swin-Tiny from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --output_dir logs/swin_tiny_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Base*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_base_384 \
        --pretrained params/swin_base_patch4_window7_224_22k.pth \
        --output_dir logs/swin_base_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Large*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_large_384 \
        --pretrained params/swin_large_patch4_window12_384_22k.pth \
        --output_dir logs/swin_large_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Citation

@article{cjw,
  title={QAHOI: Query-Based Anchors for Human-Object Interaction Detection},
  author={Junwen Chen and Keiji Yanai},
  journal={arXiv preprint arXiv:2112.08647},
  year={2021}
}

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Related tags

Overview

QAHOI

Requirements

Dataset Preparation

Evaluation

Training

Citation

Owner

Multiview 3D object detection on MultiviewC dataset through moft3d.

[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

An efficient framework for reinforcement learning.

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models

RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

Deep Learning (with PyTorch)

Machine learning algorithms for many-body quantum systems

Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

Time series annotation library.

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

End-to-end Temporal Action Detection with Transformer. [Under review]

Bald-to-Hairy Translation Using CycleGAN

This repository provides the code for MedViLL(Medical Vision Language Learner).

CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Python package to create, run, and post-process MODFLOW-based models.