An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Related tags

Deep Learningbassl
Overview

KakaoBrain pytorch pytorch-lightning

BaSSL

This is an official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL) [arxiv]

  • The method is a self-supervised learning algorithm that learns a model to capture contextual transition across boundaries during the pre-training stage. To be specific, the method leverages pseudo-boundaries and proposes three novel boundary-aware pretext tasks effective in maximizing intra-scene similarity and minimizing inter-scene similarity, thus leading to higher performance in video scene segmentation task.

1. Environmental Setup

We have tested the implementation on the following environment:

  • Python 3.7.7 / PyTorch 1.7.1 / torchvision 0.8.2 / CUDA 11.0 / Ubuntu 18.04

Also, the code is based on pytorch-lightning (==1.3.8) and all necessary dependencies can be installed by running following command.

$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install -r requirements.txt

# (optional) following installation of pillow-simd sometimes brings faster data loading.
$ pip uninstall pillow && CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

2. Prepare Data

We provide data download script for raw key-frames of MovieNet-SSeg dataset, and our re-formatted annotation files applicable for BaSSL. FYI, our script will automatically download and decompress data---1) key-frames (160G), 2) annotations (200M)---into /bassl/data/movienet .

# download movienet data
$ cd <path-to-root>
$ bash script/download_movienet_data.sh
# 
   
    /bassl/data
   
movienet
│─ 240P_frames
│    │─ tt0120885                 # movie id (or video id)
│    │    │─ shot_0000_img_0.jpg
│    │    │─ shot_0000_img_1.jpg
│    │    │─ shot_0000_img_2.jpg  # for each shot, three key-frames are given.
|    |    ::    │─ shot_1256_img_2.jpg
│    |    
│    │─ tt1093906
│         │─ shot_0000_img_0.jpg
│         │─ shot_0000_img_1.jpg
│         │─ shot_0000_img_2.jpg
|         :
│         │─ shot_1270_img_2.jpg
│
│─anno
     │─ anno.pretrain.ndjson
     │─ anno.trainvaltest.ndjson
     │─ anno.train.ndjson
     │─ anno.val.ndjson
     │─ anno.test.ndjson
     │─ vid2idx.json

3. Train (Pre-training and Fine-tuning)

We use Hydra to provide flexible training configurations. Below examples explain how to modify each training parameter for your use cases.
We assume that you are in (i.e., root of this repository).

3.1. Pre-training

(1) Pre-training BaSSL
Our pre-training is based on distributed environment (multi-GPUs training) using ddp environment supported by pytorch-lightning.
The default setting requires 8-GPUs (of V100) with a batch of 256. However, you can set the parameter config.DISTRIBUTED.NUM_PROC_PER_NODE to the number of gpus you can use or change config.TRAIN.BATCH_SIZE.effective_batch_size. You can run a single command cd bassl; bash ../scripts/run_pretrain_bassl.sh or following full command:

cd <path-to-root>/bassl
EXPR_NAME=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.DISTRIBUTED.NUM_NODES=1 \
    config.DISTRIBUTED.NUM_PROC_PER_NODE=8 \
    config.TRAIN.BATCH_SIZE.effective_batch_size=256

Note that the checkpoints are automatically saved in bassl/pretrain/ckpt/ and log files (e.g., tensorboard) are saved in `bassl/pretrain/logs/ .

(2) Running with various loss combinations
Each objective can be turned on and off independently.

cd <path-to-root>/bassl
EXPR_NAME=bassl_all_pretext_tasks
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.shot_scene_matching.enabled=true \
    config.LOSS.contextual_group_matching.enabled=true \
    config.LOSS.pseudo_boundary_prediction.enabled=true \
    config.LOSS.masked_shot_modeling.enabled=true

(3) Pre-training shot-level pre-training baselines
Shot-level pre-training methods can be trained by setting config.LOSS.sampling_method.name as one of followings:

  • instance (Simclr_instance), temporal (Simclr_temporal), shotcol (Simclr_NN).
    And, you can choose two more options: bassl (BaSSL), and bassl+shotcol (BaSSL+ShotCoL).
    Below example is for Simclr_NN, i.e., ShotCoL. Choose your favorite option ;)
cd <path-to-root>/bassl
EXPR_NAME=Simclr_NN
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.sampleing_method.name=shotcol \

3.2. Fine-tuning

(1) Simple running a single command to fine-tune pre-trained models
Firstly, download the checkpoints provided in Model Zoo section and move them into bassl/pretrain/ckpt.

cd <path-to-root>/bassl

# for fine-tuning BaSSL (10 epoch)
bash ../scripts/finetune_bassl.sh

# for fine-tuning Simclr_NN (i.e., ShotCoL)
bash ../scripts/finetune_shot-level_baseline.sh

The full process (i.e., extraction of shot-level representation followed by fine-tuning) is described in below.

(2) Extracting shot-level features from shot key-frames
For computational efficiency, we pre-extract shot-level representation and then fine-tune pre-trained models.
Set LOAD_FROM to EXPR_NAME used in the pre-training stage and change config.DISTRIBUTED.NUM_PROC_PER_NODE as the number of GPUs you can use. Then, the extracted shot-level features are saved in /bassl/data/movienet/features/ .

cd <path-to-root>/bassl
LOAD_FROM=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/extract_shot_repr.py \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	+config.LOAD_FROM=${LOAD_FROM}

(3) Fine-tuning and evaluation

cd <path-to-root>/bassl
WORK_DIR=$(pwd)

# Pre-training methods: bassl and bassl+shotcol
# which learn CRN network during the pre-training stage
LOAD_FROM=bassl
EXPR_NAME=transfer_finetune_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.0000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

# Pre-training methods: instance, temporal, shotcol
# which DO NOT learn CRN network during the pre-training stage
# thus, we use different base learning rate (determined after hyperparameter search)
LOAD_FROM=shotcol_pretrain
EXPR_NAME=finetune_scratch_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

4. Model Zoo

We provide pre-trained checkpoints trained in a self-supervised manner.
After fine-tuning with the checkpoints, the models will give scroes that are almost similar to ones shown below.

Method AP Checkpoint (pre-trained)
SimCLR (instance) 51.51 download
SimCLR (temporal) 50.05 download
SimCLR (NN) 51.17 download
BaSSL (10 epoch) 56.26 download
BaSSL (40 epoch) 57.40 download

5. Citation

If you find this code helpful for your research, please cite our paper.

@article{mun2022boundary,
  title={Boundary-aware Self-supervised Learning for Video Scene Segmentation},
  author={Mun, Jonghwan and Shin, Minchul and Han, Gunsu and
          Lee, Sangho and Ha, Sungsu and Lee, Joonseok and Kim, Eun-sol},
  journal={arXiv preprint arXiv:2201.05277},
  year={2022}
}

6. Contact for Issues

Jonghwan Mun, [email protected]
Minchul Shin, [email protected]

7. License

This project is licensed under the terms of the Apache License 2.0. Copyright 2021 Kakao Brain Corp. All Rights Reserved.

Owner
Kakao Brain
Kakao Brain Corp.
Kakao Brain
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
A PyTorch implementation of unsupervised SimCSE

A PyTorch implementation of unsupervised SimCSE

99 Dec 23, 2022
LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Simon Boehm 183 Jan 02, 2023
MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

MiraiML Mirai: future in japanese. MiraiML is an asynchronous engine for continuous & autonomous machine learning, built for real-time usage. Usage In

Arthur Paulino 25 Jul 27, 2022
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

Merantix Momentum 249 Dec 07, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 51 Jan 06, 2023
Model-based reinforcement learning in TensorFlow

Bellman Website | Twitter | Documentation (latest) What does Bellman do? Bellman is a package for model-based reinforcement learning (MBRL) in Python,

46 Nov 09, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
Fully Connected DenseNet for Image Segmentation

Fully Connected DenseNets for Semantic Segmentation Fully Connected DenseNet for Image Segmentation implementation of the paper The One Hundred Layers

Somshubra Majumdar 84 Oct 31, 2022
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation (CVPR 2020)

Super-BPD for Fast Image Segmentation (CVPR 2020) Introduction We propose direction-based super-BPD, an alternative to superpixel, for fast generic im

189 Dec 07, 2022
Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

self-driving-car In this repository I will share the source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree. Hope this might

Andrea Palazzi 2.4k Dec 29, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 04, 2023
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 05, 2022
Learning to Segment Instances in Videos with Spatial Propagation Network

Learning to Segment Instances in Videos with Spatial Propagation Network This paper is available at the 2017 DAVIS Challenge website. Check our result

Jingchun Cheng 145 Sep 28, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

FedLearn-algo Installation Development Environment Checklist python3 (3.6 or 3.7) is required. To configure and check the development environment is c

89 Nov 14, 2022
Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

Mark Fairbanks 144 Jan 08, 2023
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022