[CVPR 2021] MiVOS - Scribble to Mask module

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

[arXiv] [Paper PDF] [Project Page]

A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!

Ex1 Ex2

Overall structure and capabilities

MiVOS Mask-Propagation Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation ✔️
DAVIS interactive evaluation ✔️
User interaction GUI tool ✔️
Dense Correspondences ✔️
Train propagation module ✔️
Train S2M (interaction) module ✔️
Train fusion module ✔️
Generate more synthetic data ✔️

Requirements

The package versions shown here are the ones that I used. You might not need the exact versions.

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install opencv-contrib-python gitpython gdown

Pretrained model

Download and put the model in ./saves/. Alternatively use the provided download_model.py.

[OneDrive Mirror]

Interactive GUI

python interactive.py --image <image>

Controls:

Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)

Known issues

The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.

Training

Datasets

  1. Download and extract LVIS training set.
  2. Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the download_datasets.py in Mask-Propagation.
├── lvis
│   ├── lvis_v1_train.json
│   └── train2017
├── Scribble-to-Mask
└── static
    ├── BIG_small
    └── ...

Commands

Use the deeplabv3plus_resnet50 pretrained model provided here.

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>

Credit

Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments
  • AttributeError: Caught AttributeError in DataLoader worker process 0

    AttributeError: Caught AttributeError in DataLoader worker process 0

    Hello! I followed the instructions of the training command, it has thrown an error about AttributeError. dataloader_error I put the static folder outside this repository as you mentioned. It is confusing that I can use the same datasets for the pretraining propagation module, the train.py in Mask-Propagation works fine.

    opened by xwhkkk 2
  • git.exc.InvalidGitRepositoryError when running train.py

    git.exc.InvalidGitRepositoryError when running train.py

    Hello! I followed the instruction of the training command, but it has thrown an error about GitRepositoryError. gitError I used command : CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 1842 --nproc_per_node=2 train.py --id s2m --load_deeplab ./deeplab_resnet50/best_deeplabv3plus_resnet50_voc_os16.pth, and I have 2 GPUs. Could you give me some suggestions?

    opened by xwhkkk 2
  • About evaluation of the model

    About evaluation of the model

    Hi,

    thank you for the nice work.

    I have a concern about the evaluation of the model. Because there is no validation set to pick the best model. It may has a potential overfitting problem. (Or what should the validation set for interactive segmentation look like? If there is a unified standard, it will be more helpful for everyone to compare their methods.)

    In interactive object segmentation setting, is this setting popular? I am new here for the interactive segmentation. Wish to solve my concern, thank you.

    opened by Limingxing00 2
  • Question about Local Control Strategy

    Question about Local Control Strategy

    A simple but practical segmentation tool! I've read your paper, and it says that local control strategy is used in S2M. However, I don't find the local control step in this code. Why don't you provide it in this tool? Will local control make significant difference to the performance?

    opened by distillation-dcf 1
  • DeepLabv3 pre-trained models

    DeepLabv3 pre-trained models

    Hello,

    I wanted to mention that in order to train S2M from scratch, using the deeplabv3_resnet50 pre-trained model provided in this repo, returns the following error: KeyError: 'classifier.classifier.0.convs.0.0.weight. Meaning that the weights from this layer are not present in deeplabv3_resnet50. But using the deeplabv3plus_resnet50 from the same repo executes without errors.

    Best!

    opened by UndecidedBoy 1
  • saving error

    saving error

    Hello! Thanks for sharing your code. When I run python interactive.py and want to save the masks, appeared following error.

    image

    Could you give me some suggestions?

    opened by xwhkkk 3
  • Fix simple issues and allow for cpu only use

    Fix simple issues and allow for cpu only use

    I had to make some changes to be able to use the code on cpu only system and had troubles saving the mask from the interactive GUI and fixed it. Thanks for the great work.

    opened by rami-alloush 3
Releases(1.0)
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 02, 2023
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
Making a music video with Wav2CLIP and VQGAN-CLIP

music2video Overview A repo for making a music video with Wav2CLIP and VQGAN-CLIP. The base code was derived from VQGAN-CLIP The CLIP embedding for au

Joel Jang | 장요엘 163 Dec 26, 2022
This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Visual Attributes in the Wild (VAW) This repository provides data for the VAW dataset as described in the CVPR 2021 Paper: Learning to Predict Visual

Adobe Research 36 Dec 30, 2022
Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features". The code is reproduced from thi

1 Nov 02, 2022
Sequence-tagging using deep learning

Classification using Deep Learning Requirements PyTorch version = 1.9.1+cu111 Python version = 3.8.10 PyTorch-Lightning version = 1.4.9 Huggingface

Vineet Kumar 2 Dec 20, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Pano-AVQA Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021) [Paper] [Poster] [Video] Getting Starte

Heeseung Yun 9 Dec 23, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
PyTorch implementation of Pointnet2/Pointnet++

Pointnet2/Pointnet++ PyTorch Project Status: Unmaintained. Due to finite time, I have no plans to update this code and I will not be responding to iss

Erik Wijmans 1.2k Dec 29, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 08, 2023
Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

Clothes Parsing Overview This code provides an implementation of the research paper: A High Performance CRF Model for Clothes Parsing Edgar Simo-S

Edgar Simo-Serra 119 Nov 21, 2022
TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network Created by Seunghoon Hong, Junhyuk Oh,

42 Jun 29, 2022
The versatile ocean simulator, in pure Python, powered by JAX.

Veros is the versatile ocean simulator -- it aims to be a powerful tool that makes high-performance ocean modeling approachable and fun. Because Veros

TeamOcean 245 Dec 20, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
OpenIPDM is a MATLAB open-source platform that stands for infrastructures probabilistic deterioration model

Open-Source Toolbox for Infrastructures Probabilistic Deterioration Modelling OpenIPDM is a MATLAB open-source platform that stands for infrastructure

CIVML 0 Jan 20, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

FGVC8 Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Catego

Marcos V. Conde 19 Dec 06, 2022