[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

Related tags

Deep LearningSSVC
Overview

SSVC

The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning]

samples of the generated motion-preserved video with threshold $\alpha=0.5$.

Requirements

  • python3
  • torch1.1+
  • PIL
  • FrEIA==0.2 (Flow-based model)
  • lintel==1.0 (Decode mp4 videos on the fly)

Structure

  • backbone
  • data
    • lists: train/val lists (.txt)
    • augmentation.py: train/val data augmentation during ssl pre-training
    • vDataLoader.py: custom your path to data list
  • model
    • advflow: flow-based model
    • classifier.py: linear classifier for down-stream tasks
    • infonce.py: combine S$^2$VC with MoCo
  • flow
    • pre-trained flow-based model weights
  • utils
  • main_pretrain.py: the main function for self-supervised pretrain
  • main_eval.py: the main function for supervised fine-tune

Self-supervised Pretrain

DDP

python -m torch.distributed.launch --nproc_per_node=1 --master_port 1234 main_pretrain.py --net r3d18 --img_dim 112 --seq_len 16 --aug_type 1 -t 0.5 -bsz 64 --gpu 0,1 --dataset XX

Single GPU

python main_pretrain.py --net r3d18 --img_dim 112 --seq_len 16 --aug_type 1 -t 0.5 -bsz 64 --gpu 0 --dataset XX

Evaluation

NN-Retrieval

python main_eval.py --retrieval --test SSL_Pt_Model_PTH --dataset XX --gpu X

Finetune

# fine-tune overall model
python main_eval.py --train_what ft --pretrain SSL_Pt_Model_PTH --dataset XX --gpu XX \
--net r3d18 --img_dim 224 --seq_len 32

# freeze backbone, finetune last layer
python main_eval.py --train_what last --pretrain SSL_Pt_Model_PTH --dataset XX --gpu XX \
--net r3d18 --img_dim 224 --seq_len 32

Test

python main_eval.py --train_what XX --ten_crop --test Sup_Ft_Model_PTH --gpu X \
--dataset XX --net r3d18 --img_dim 224 --seq_len 32
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Steerable discovery of neural audio effects

Steerable discovery of neural audio effects Christian J. Steinmetz and Joshua D. Reiss Abstract Applications of deep learning for audio effects often

Christian J. Steinmetz 182 Dec 29, 2022
Use Python, OpenCV, and MediaPipe to control a keyboard with facial gestures

CheekyKeys A Face-Computer Interface CheekyKeys lets you control your keyboard using your face. View a fuller demo and more background on the project

69 Nov 09, 2022
Rendering Point Clouds with Compute Shaders

Compute Shader Based Point Cloud Rendering This repository contains the source code to our techreport: Rendering Point Clouds with Compute Shaders and

Markus Schütz 460 Jan 05, 2023
Video Swin Transformer - PyTorch

Video-Swin-Transformer-Pytorch This repo is a simple usage of the official implementation "Video Swin Transformer". Introduction Video Swin Transforme

Haofan Wang 116 Dec 20, 2022
Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

Yiyu Wang 2 Jan 01, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
Predicting 10 different clothing types using Xception pre-trained model.

Predicting-Clothing-Types Predicting 10 different clothing types using Xception pre-trained model from Keras library. It is reimplemented version from

AbdAssalam Ahmad 3 Dec 29, 2021
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
CLIP + VQGAN / PixelDraw

clipit Yet Another VQGAN-CLIP Codebase This started as a fork of @nerdyrodent's VQGAN-CLIP code which was based on the notebooks of @RiversWithWings a

dribnet 276 Dec 12, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 04, 2023
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (

Jaemin Cho 98 Dec 15, 2022
On the adaptation of recurrent neural networks for system identification

On the adaptation of recurrent neural networks for system identification This repository contains the Python code to reproduce the results of the pape

Marco Forgione 3 Jan 13, 2022
This repo contains the code for paper Inverse Weighted Survival Games

Inverse-Weighted-Survival-Games This repo contains the code for paper Inverse Weighted Survival Games instructions general loss function (--lfn) can b

3 Jan 12, 2022
Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Visual Parser (ViP) This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers. Key Feature

Shuyang Sun 117 Dec 11, 2022
PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction Introduction This is official PyTorch implementation of Towards Accurate Alignment

TANG Xiao 96 Dec 27, 2022
Sound Source Localization for AI Grand Challenge 2021

Sound-Source-Localization Sound Source Localization study for AI Grand Challenge 2021 (sponsored by NC Soft Vision Lab) Preparation 1. Place the data-

sanghoon 19 Mar 29, 2022
Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

51 Dec 14, 2022
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021