Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Overview

Spatio-Temporal Entropy Model

A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

More details can be found in the following paper:

Spatiotemporal Entropy Model is All You Need for Learned Video Compression
Alibaba Group, arxiv 2021.4.13
Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, Hao Li

Note that It Is Not An Official Implementation Code.

The differences with the original paper are not limited to the following:

  • The number of model channels are fewer.
  • The Encoder/Decoder in original paper consists of conditional conv1 to support various rate in one single model. And the architecture is the same as [2]2. However, I only use the single rate Encoder/Decoder with the same architecture as [2]2

ToDo:

  • 1. various rate model training and evaluation.

Environment

  • Python == 3.7.10
  • Pytorch == 1.7.1
  • CompressAI

Dataset

I use the Vimeo90k Septuplet Dataset to train the models. The Dataset contains about 64612 training sequences and 7824 testing sequences. All sequence contains 7 frames.

The train dataset folder structure is as

.dataset/vimeo_septuplet/
│  sep_testlist.txt
│  sep_trainlist.txt
│  vimeo_septuplet.txt
│  
├─sequences
│  ├─00001
│  │  ├─0001
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
│  │  ├─0002
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
...

I evaluate the model on UVG & HEVC TEST SEQUENCE Dataset. The test dataset folder structure is as

.dataset/UVG/
├─PNG
│  ├─Beauty
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │      
│  ├─HoneyBee
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │     
│  │      ...
.dataset/HEVC/
├─BasketballDrill
│      f001.png
│      f002.png
│      f003.png
│      ...
│      f098.png
│      f099.png
│      f100.png
│      
├─BasketballDrive
│      f001.png
│      f002.png
│      ...

Train Your Own Model

python3 trainSTEM.py -d /path/to/your/image/dataset/vimeo_septuplet --lambda 0.01 -lr 1e-4 --batch-size 16 --model-save /path/to/your/model/save/dir --cuda --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar

I tried to train with Mean-Scale Hyperprior / Joint Autoregressive Hierarchical Priors / Cheng2020Attn in CompressAI library and find that a powerful I Frame Compressor does have great performance benefits.

Evaluate Your Own Model

python3 evalSTEM.py --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar --entropy-model-path /path/to/your/stem/checkpoint.pth.tar

Currently only support evaluation on UVG & HEVC TEST SEQUENCE Dataset.

Result

测试数据集UVG PSNR BPP PSNR in paper BPP in paper
SpatioTemporalPriorModel_Res 36.104 0.087 35.95 0.080
SpatioTemporalPriorModel 36.053 0.080 35.95 0.082
SpatioTemporalPriorModelWithoutTPM None None 35.95 0.100
SpatioTemporalPriorModelWithoutSPM 36.066 0.080 35.95 0.087
SpatioTemporalPriorModelWithoutSPMTPM 36.021 0.141 35.95 0.123

PSNR in paper & BPP in paper is estimated from Figure 6 in the original paper.

It seems that the context model SPM has no good effect in my experiments.

I look forward to receiving more feedback on the test results, and feel free to share your test results!

More Informations About Various Rate Model Training

As stated in the original paper, they use a variable-rate auto-encoder to support various rate in one single model. I tried to train STEM with GainedVAE, which is also a various rate model. Some point can achieve comparable r-d performance while others may degrade. What's more, the interpolation result could have more performance degradation cases.

Probably we need Loss Modulator3 for various rate model training. Read Oren Ripple's ICCV 2021 paper3 for more details.

Acknowledgement

The framework is based on CompressAI, I add the model in compressai.models.spatiotemporalpriors. And trainSTEM.py/evalSTEM.py is modified with reference to compressai_examples

Reference

[1] [Variable Rate Deep Image Compression With a Conditional Autoencoder](https://openaccess.thecvf.com/content_ICCV_2019/html/Choi_Variable_Rate_Deep_Image_Compression_With_a_Conditional_Autoencoder_ICCV_2019_paper.html)
[2] [Joint Autoregressive and Hierarchical Priors for Learned Image Compression](https://arxiv.org/abs/1809.02736)
[3] [ELF-VC Efficient Learned Flexible-Rate Video Coding](https://arxiv.org/abs/2104.14335)

Contact

Feel free to contact me if there is any question about the code or to discuss any problems with image and video compression. ([email protected])

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️ This repo contains a PyTorch implementation of the original GAT paper ( 🔗 Veličković et

Aleksa Gordić 1.9k Jan 09, 2023
[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

CoCLR: Self-supervised Co-Training for Video Representation Learning This repository contains the implementation of: InfoNCE (MoCo on videos) UberNCE

Tengda Han 271 Jan 02, 2023
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
The source code and dataset for the RecGURU paper (WSDM 2022)

RecGURU About The Project Source code and baselines for the RecGURU paper "RecGURU: Adversarial Learning of Generalized User Representations for Cross

Chenglin Li 17 Jan 07, 2023
Official Implementation for Fast Training of Neural Lumigraph Representations using Meta Learning.

Fast Training of Neural Lumigraph Representations using Meta Learning Project Page | Paper | Data Alexander W. Bergman, Petr Kellnhofer, Gordon Wetzst

Alex 39 Oct 08, 2022
Tensorflow 2.x implementation of Vision-Transformer model

Vision Transformer Unofficial Tensorflow 2.x implementation of the Transformer based Image Classification model proposed by the paper AN IMAGE IS WORT

Soumik Rakshit 16 Jul 20, 2022
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 04, 2023
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
BMVC 2021 Oral: code for BI-GCN: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation

BMVC 2021 BI-GConv: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation Necassary Dependencies: PyTorch 1.2.0 Python 3.

Yanda Meng 15 Nov 08, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Graph Wavelet Neural Network ⠀⠀ A PyTorch implementation of Graph Wavelet Neural Network (ICLR 2019). Abstract We present graph wavelet neural network

Benedek Rozemberczki 490 Dec 16, 2022
Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Machine learning enabling high-throughput and remote operations at large-scale user facilities. Overview This repository contains the source code and

BNL 4 Sep 24, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
A Deep Learning based project for creating line art portraits.

ArtLine The main aim of the project is to create amazing line art portraits. Sounds Intresting,let's get to the pictures!! Model-(Smooth) Model-(Quali

Vijish Madhavan 3.3k Jan 07, 2023
Implementation of Shape and Electrostatic similarity metric in deepFMPO.

DeepFMPO v3D Code accompanying the paper "On the value of using 3D-shape and electrostatic similarities in deep generative methods". The paper can be

34 Nov 28, 2022
使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集(详细过程)并通过flask部署 依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows,请使用 pycoc

HB.com 19 Dec 28, 2022
Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN"

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtu

68 Dec 21, 2022