Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Overview

Spatio-Temporal Entropy Model

A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

More details can be found in the following paper:

Spatiotemporal Entropy Model is All You Need for Learned Video Compression
Alibaba Group, arxiv 2021.4.13
Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, Hao Li

Note that It Is Not An Official Implementation Code.

The differences with the original paper are not limited to the following:

  • The number of model channels are fewer.
  • The Encoder/Decoder in original paper consists of conditional conv1 to support various rate in one single model. And the architecture is the same as [2]2. However, I only use the single rate Encoder/Decoder with the same architecture as [2]2

ToDo:

  • 1. various rate model training and evaluation.

Environment

  • Python == 3.7.10
  • Pytorch == 1.7.1
  • CompressAI

Dataset

I use the Vimeo90k Septuplet Dataset to train the models. The Dataset contains about 64612 training sequences and 7824 testing sequences. All sequence contains 7 frames.

The train dataset folder structure is as

.dataset/vimeo_septuplet/
│  sep_testlist.txt
│  sep_trainlist.txt
│  vimeo_septuplet.txt
│  
├─sequences
│  ├─00001
│  │  ├─0001
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
│  │  ├─0002
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
...

I evaluate the model on UVG & HEVC TEST SEQUENCE Dataset. The test dataset folder structure is as

.dataset/UVG/
├─PNG
│  ├─Beauty
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │      
│  ├─HoneyBee
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │     
│  │      ...
.dataset/HEVC/
├─BasketballDrill
│      f001.png
│      f002.png
│      f003.png
│      ...
│      f098.png
│      f099.png
│      f100.png
│      
├─BasketballDrive
│      f001.png
│      f002.png
│      ...

Train Your Own Model

python3 trainSTEM.py -d /path/to/your/image/dataset/vimeo_septuplet --lambda 0.01 -lr 1e-4 --batch-size 16 --model-save /path/to/your/model/save/dir --cuda --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar

I tried to train with Mean-Scale Hyperprior / Joint Autoregressive Hierarchical Priors / Cheng2020Attn in CompressAI library and find that a powerful I Frame Compressor does have great performance benefits.

Evaluate Your Own Model

python3 evalSTEM.py --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar --entropy-model-path /path/to/your/stem/checkpoint.pth.tar

Currently only support evaluation on UVG & HEVC TEST SEQUENCE Dataset.

Result

测试数据集UVG PSNR BPP PSNR in paper BPP in paper
SpatioTemporalPriorModel_Res 36.104 0.087 35.95 0.080
SpatioTemporalPriorModel 36.053 0.080 35.95 0.082
SpatioTemporalPriorModelWithoutTPM None None 35.95 0.100
SpatioTemporalPriorModelWithoutSPM 36.066 0.080 35.95 0.087
SpatioTemporalPriorModelWithoutSPMTPM 36.021 0.141 35.95 0.123

PSNR in paper & BPP in paper is estimated from Figure 6 in the original paper.

It seems that the context model SPM has no good effect in my experiments.

I look forward to receiving more feedback on the test results, and feel free to share your test results!

More Informations About Various Rate Model Training

As stated in the original paper, they use a variable-rate auto-encoder to support various rate in one single model. I tried to train STEM with GainedVAE, which is also a various rate model. Some point can achieve comparable r-d performance while others may degrade. What's more, the interpolation result could have more performance degradation cases.

Probably we need Loss Modulator3 for various rate model training. Read Oren Ripple's ICCV 2021 paper3 for more details.

Acknowledgement

The framework is based on CompressAI, I add the model in compressai.models.spatiotemporalpriors. And trainSTEM.py/evalSTEM.py is modified with reference to compressai_examples

Reference

[1] [Variable Rate Deep Image Compression With a Conditional Autoencoder](https://openaccess.thecvf.com/content_ICCV_2019/html/Choi_Variable_Rate_Deep_Image_Compression_With_a_Conditional_Autoencoder_ICCV_2019_paper.html)
[2] [Joint Autoregressive and Hierarchical Priors for Learned Image Compression](https://arxiv.org/abs/1809.02736)
[3] [ELF-VC Efficient Learned Flexible-Rate Video Coding](https://arxiv.org/abs/2104.14335)

Contact

Feel free to contact me if there is any question about the code or to discuss any problems with image and video compression. ([email protected])

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Misc YOLOL scripts for use in the Starbase space sandbox videogame

starbase-misc Misc YOLOL scripts for use in the Starbase space sandbox videogame. Each directory contains standalone YOLOL scripts. They don't really

4 Oct 17, 2021
Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Soubhik Sanyal 689 Dec 25, 2022
CS50's Introduction to Artificial Intelligence Test Scripts

CS50's Introduction to Artificial Intelligence Test Scripts 🤷‍♂️ What's this? 🤷‍♀️ This repository contains Python scripts to automate tests for mos

Jet Kan 2 Dec 28, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

7 Aug 09, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

250 Jan 08, 2023
TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

TransPrompt This code is implement for our EMNLP 2021's paper 《TransPrompt:Towards an Automatic Transferable Prompting Framework for Few-shot Text Cla

WangJianing 23 Dec 21, 2022
Learnable Motion Coherence for Correspondence Pruning

Learnable Motion Coherence for Correspondence Pruning Yuan Liu, Lingjie Liu, Cheng Lin, Zhen Dong, Wenping Wang Project Page Any questions or discussi

liuyuan 41 Nov 30, 2022
Cancer metastasis detection with neural conditional random field (NCRF)

NCRF Prerequisites Data Whole slide images Annotations Patch images Model Training Testing Tissue mask Probability map Tumor localization FROC evaluat

Baidu Research 731 Jan 01, 2023
Implementation of Pix2Seq in PyTorch

pix2seq-pytorch Implementation of Pix2Seq paper Different from the paper image input size 1280 bin size 1280 LambdaLR scheduler used instead of Linear

Tony Shin 9 Dec 15, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

CircleNet: Anchor-free Detection with Circle Representation The official implementation of CircleNet, MICCAI 2020 [PyTorch] [project page] [MICCAI pap

The Biomedical Data Representation and Learning Lab 45 Nov 18, 2022
Python implementation of "Single Image Haze Removal Using Dark Channel Prior"

##Dependencies pillow(~2.6.0) Numpy(~1.9.0) If the scripts throw AttributeError: __float__, make sure your pillow has jpeg support e.g. try: $ sudo ap

Joyee Cheung 73 Dec 20, 2022
A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

Em 6 Sep 17, 2022
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 06, 2023
Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Weakly- and Semi-Supervised Panoptic Segmentation by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr This repository demonstrates the weakly supervised gro

Qizhu Li 159 Dec 20, 2022