[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Overview

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22)

Picture1

Preview version paper of this work is available at: https://arxiv.org/abs/2112.02853

Qualitative results and comparisons with previous SOTAs are available at: https://youtu.be/X6BsS3t3wnc

This repo is a preview version. More details will be added later.

Abstract

Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error propagation through a correction mechanism with high reliability.

The key insight is to disentangle the correction from the conventional mask propagation process with reliable cues.

We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively. Specifically, we assemble the modulators with a cascaded propagation-correction scheme. This avoids overriding the effects of the reliable correction modulator by the propagation modulator.

Although the reference frame with the ground truth label provides reliable cues, it could be very different from the target frame and introduce uncertain or incomplete correlations. We augment the reference cues by supplementing reliable feature patches to a maintained pool, thus offering more comprehensive and expressive object representations to the modulators. In addition, a reliability filter is designed to retrieve reliable patches and pass them in subsequent frames.

Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance.

Requirements

This docker image may contain some redundent packages. A more light-weight one will be generated later.

docker image: xxiaoh/vos:10.1-cudnn7-torch1.4_v3

Citation

If you find this work is useful for your research, please consider citing:

@misc{xu2021reliable,
  title={Reliable Propagation-Correction Modulation for Video Object Segmentation}, 
  author={Xiaohao Xu and Jinglu Wang and Xiao Li and Yan Lu},
  year={2021},
  eprint={2112.02853},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Credit

CFBI: https://github.com/z-x-yang/CFBI

Deeplab: https://github.com/VainF/DeepLabV3Plus-Pytorch

GCT: https://github.com/z-x-yang/GCT

Acknowledgement

Firstly, the author would like to thank Rex for his insightful viewpoints about VOS during e-mail discussion! Also, this work is largely built upon the codebase of CFBI. Thanks for the author of CFBI to release such a wonderful code repo for further work to build upon!

Related impressive works in VOS

AOT [NeurIPS 2021]: https://github.com/z-x-yang/AOT

STCN [NeurIPS 2021]: https://github.com/hkchengrex/STCN

MiVOS [CVPR 2021]: https://github.com/hkchengrex/MiVOS

SSTVOS [CVPR 2021]: https://github.com/dukebw/SSTVOS

GraphMemVOS [ECCV 2020]: https://github.com/carrierlxk/GraphMemVOS

CFBI [ECCV 2020]: https://github.com/z-x-yang/CFBI

STM [ICCV 2019]: https://github.com/seoungwugoh/STM

FEELVOS [CVPR 2019]: https://github.com/kim-younghan/FEELVOS

Useful websites for VOS

The 1st Large-scale Video Object Segmentation Challenge: https://competitions.codalab.org/competitions/19544#learn_the_details

The 2nd Large-scale Video Object Segmentation Challenge - Track 1: Video Object Segmentation: https://competitions.codalab.org/competitions/20127#learn_the_details

The Semi-Supervised DAVIS Challenge on Video Object Segmentation @ CVPR 2020: https://competitions.codalab.org/competitions/20516#participate-submit_results

DAVIS: https://davischallenge.org/

YouTube-VOS: https://youtube-vos.org/

Papers with code for Semi-VOS: https://paperswithcode.com/task/semi-supervised-video-object-segmentation

Welcome to comments and discussions!!

Xiaohao Xu: [email protected]

Owner
Xiaohao Xu
Xiaohao Xu
Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Welcome to the CSIRO Face Analysis SDK. Documentation for the SDK can be found in doc/documentation.html. All code in this SDK is provided according t

Luiz Carlos Vieira 7 Jul 16, 2020
RoMa: A lightweight library to deal with 3D rotations in PyTorch.

RoMa: A lightweight library to deal with 3D rotations in PyTorch. RoMa (which stands for Rotation Manipulation) provides differentiable mappings betwe

NAVER 90 Dec 27, 2022
Synthetic structured data generators

Join us on What is Synthetic Data? Synthetic data is artificially generated data that is not collected from real world events. It replicates the stati

YData 850 Jan 07, 2023
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Primitive Representation Learning Network (PREN) This repository contains the code for our paper accepted by CVPR 2021 Primitive Representation Learni

Ruijie Yan 76 Jan 02, 2023
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline

12 Dec 02, 2022
Trading environnement for RL agents, backtesting and training.

TradzQAI Trading environnement for RL agents, backtesting and training. Live session with coinbasepro-python is finaly arrived ! Available sessions: L

Tony Denion 164 Oct 30, 2022
A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

AI Summer 65 Sep 12, 2022
TransReID: Transformer-based Object Re-Identification

TransReID: Transformer-based Object Re-Identification [arxiv] The official repository for TransReID: Transformer-based Object Re-Identification achiev

569 Dec 30, 2022
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

62 Dec 21, 2022
Totally Versatile Miscellanea for Pytorch

Totally Versatile Miscellania for PyTorch Thomas Viehmann [email protected] Thi

Thomas Viehmann 428 Dec 28, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features | paper | Official PyTorch implementation for Mul

48 Dec 28, 2022
Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022
audioLIME: Listenable Explanations Using Source Separation

audioLIME This repository contains the Python package audioLIME, a tool for creating listenable explanations for machine learning models in music info

Institute of Computational Perception 27 Dec 01, 2022