An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Related tags

Deep LearningSFA
Overview

Sequence Feature Alignment (SFA)

By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao

This repository is an official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers, which is accepted to ACM MultiMedia 2021.

Introduction

TL; DR. We develop a domain adaptive object detection method SFA that is specialized for adaptive detection transformers. It contains a domain query-based feature alignment model and a token-wise feature alignment module for global and local feature alignment respectively, and a bipartite matching consistency loss for improving robustness.

SFA

Abstract. Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.

Main Results

The experimental results and model weights for Cityscapes to Foggy Cityscapes are shown below.

Model mAP [email protected] [email protected] [email protected] [email protected] [email protected] Log & Model
SFA-DefDETR 21.5 41.1 20.0 3.9 20.9 43.0 Google Drive
SFA-DefDETR-BoxRefine 23.9 42.6 22.5 3.8 21.6 46.7 Google Drive
SFA-DefDETR-TwoStage 24.1 42.5 22.8 3.8 22.0 48.1 Google Drive

Note:

  1. All models of SFA are trained with total batch size of 4.
  2. "DefDETR" means Deformable DETR (with R50 backbone).
  3. "BoxRefine" means Deformable DETR with iterative box refinement.
  4. "TwoStage" indicates the two-stage Deformable DETR variant.
  5. The original implementation is based on our internal codebase. There are slight differences in the released code are slight differences. For example, we only use the middle features output by the first encoder and decoder layers for hierarchical feature alignment, to reduce computational costs during training.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n sfa python=3.7 pip

    Then, activate the environment:

    conda activate sfa
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements/requirements.txt
  • Logging using wandb (optional)

    pip install -r requirements/optional.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

We use the preparation of Cityscapes to Foggy Cityscapes adaptation as demonstration. Other domain adaptation benchmarks can be prepared in analog. Cityscapes and Foggy Cityscapes datasets can be downloaded from here. The annotations in COCO format can be obtained from here. Afterward, please organize the datasets and annotations as following:

[coco_path]
└─ cityscapes
   └─ leftImg8bit
      └─ train
      └─ val
└─ foggy_cityscapes
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ CocoFormatAnnos
   └─ cityscapes_train_cocostyle.json
   └─ cityscapes_foggy_train_cocostyle.json
   └─ cityscapes_foggy_val_cocostyle.json

Training

As an example, we provide commands for training our SFA on a single node with 4 GPUs for weather adaptation.

Training SFA-DeformableDETR

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr.sh --wandb

Training SFA-DeformableDETR-BoxRefine

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement.sh --wandb

Training SFA-DeformableDETR-TwoStage

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage.sh --wandb

Training Source-only DeformableDETR

Please refer to the source branch.

Evaluation

You can get the config file and pretrained model of SFA (the link is in "Main Results" session), then run following command to evaluate it on Foggy Cityscapes validation set:

<path to config file> --resume <path to pre-trained model> --eval

You can also run distributed evaluation by using ./tools/run_dist_launch.sh or ./tools/run_dist_slurm.sh.

Acknowledgement

This project is based on DETR and Deformable DETR. Thanks for their wonderful works. See LICENSE for more details.

Citing SFA

If you find SFA useful in your research, please consider citing:

@inproceedings{wang2021exploring ,
  title={Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers},
  author={Wen, Wang and Yang, Cao and Jing, Zhang and Fengxiang, He and Zheng-Jun, Zha and Yonggang, Wen and Dacheng, Tao},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}
Owner
WangWen
WangWen
Editing a Conditional Radiance Field

Editing Conditional Radiance Fields Project | Paper | Video | Demo Editing Conditional Radiance Fields Steven Liu, Xiuming Zhang, Zhoutong Zhang, Rich

Steven Liu 216 Dec 30, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 04, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
A Runtime method overload decorator which should behave like a compiled language

strongtyping-pyoverload A Runtime method overload decorator which should behave like a compiled language there is a override decorator from typing whi

20 Oct 31, 2022
[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

OpenCOOD OpenCOOD is an Open COOperative Detection framework for autonomous driving. It is also the official implementation of the ICRA 2022 paper OPV

Runsheng Xu 322 Dec 23, 2022
Official implementation of Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models at NeurIPS 2021

Representer Point Selection via Local Jacobian Expansion for Classifier Explanation of Deep Neural Networks and Ensemble Models This repository is the

Yi(Amy) Sui 2 Dec 01, 2021
Geometric Deep Learning Extension Library for PyTorch

Documentation | Paper | Colab Notebooks | External Resources | OGB Examples PyTorch Geometric (PyG) is a geometric deep learning extension library for

Matthias Fey 16.5k Jan 08, 2023
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022
Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

IrwGAN (ICCV2021) Unaligned Image-to-Image Translation by Learning to Reweight [Update] 12/15/2021 All dataset are released, trained models and genera

37 Nov 09, 2022
Symbolic Music Generation with Diffusion Models

Symbolic Music Generation with Diffusion Models Supplementary code release for our work Symbolic Music Generation with Diffusion Models. Installation

Magenta 119 Jan 07, 2023
Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Code for running simulations for the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Lin

Matthew Farrell 1 Nov 22, 2022
Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Time2box Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

LingCai 4 Aug 23, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 799 Dec 28, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch = 1.0 torchvision = 0.2.0 Python 3 Environm

15 Apr 04, 2022
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Shoufa Chen 244 Dec 27, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternativ

9 Oct 18, 2022
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023