Split Variational AutoEncoder

Overview

Split-VAE

Split Variational AutoEncoder

Introduction

This repository contains and implemementation of a Split Variational AutoEncoder (SVAE). In a SVAE the output y is computed as a weighted sum

sigma * y1 + (1-sigma) * y2

where y1 and y2 are two distinct generated images, and sigma is a learned compositional map.

A Split VAE is trained as a normal VAE: no additional loss is added over the splitted images y1 and y2.

Splitting is meant to offer to the network a more flexible way to learn fruitful and independent features: as a result the variable collapse phenomenon is greatly reduced and the possibility of exploiting a larger number of latent variables improves the quality and diversity of generated samples.

Types of Splitting

The decomposition is nondeterministic, but follows two main schemes, that we may roughly categorize as either syntactical or semantical.

Syntactic decomposition

In this case, the compositional map tends to exploit the strong correlation between adjacent pixels, splitting the image in two complementary high frequency sub-images.

Below are some examples of syntactic splitting. In all the following pictures, the first row is the compositional map, then in order y1, y2 and y.

Semantic decomposition

In this case, the map typically focuses on the contours of objects, splitting the image in interesting variations of its content, with more marked and distinctive features.

Here are some examples of semantic splitting:

In case of sematic splitting, the Frèchet Inception Distance (FID) of y1 and y2 is frequently lower (hence better) than that of y, that clearly suffers from being the average of the formers.

In a sense, a SVAE forces the Variational Autoencoder to make choices, in contrast with its intrinsic tendency to average between alternatives with the aim to minimize the reconstruction loss towards a specific sample.

More examples of GENERATED images

Examples of Mnist-like gnerated digits (FID=7.47)

Here are some additional examples of semantic compositonal maps generated for CelebA, quite similar to drawings. The quality and precision of contours is both unexpected and remarkable.

And some generated faces (FID=35.1). Observe in particular the wide differentiation in pose, illumination, colors, age and expressions.

Owner
Andrea Asperti
Full Professor in Computer Science at the University of Bologna.
Andrea Asperti
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022
source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Source code for "Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge" Reference: Abhin Shah, Karthikeyan Shanmugam, Kartik Ahu

Abhin Shah 1 Jun 03, 2022
Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

Yi-Lun Liao 62 Oct 25, 2022
AI Summer's complete catalog of articles

Learn Deep Learning with AI Summer A collection of all articles (almost 100) written for the AI Summer blog organized by topic. Deep Learning Theory M

AI Summer 95 Dec 29, 2022
Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

81 Dec 15, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

31 Dec 07, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Ditto: Building Digital Twins of Articulated Objects from Interaction Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu CVPR 2022, Oral Project | arxiv News 2022

UT Robot Perception and Learning Lab 78 Dec 22, 2022
Breast cancer is been classified into benign tumour and malignant tumour.

Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.

1 Feb 04, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
这是一个yolo3-tf2的源码,可以用于训练自己的模型。

YOLOV3:You Only Look Once目标检测模型在Tensorflow2当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料

Bubbliiiing 68 Dec 21, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

pv_predict_unet-lstm Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods." IEEE Transactions

FolkScientistInDL 8 Oct 08, 2022
Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching This is our attempt of the shared task on Quan

Manav Nitin Kapadnis 12 Jul 08, 2022
EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Dec 29, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
Add gui for YoloV5 using PyQt5

HEAD 更新2021.08.16 **添加图片和视频保存功能: 1.图片和视频按照当前系统时间进行命名 2.各自检测结果存放入output文件夹 3.摄像头检测的默认设备序号更改为0,减少调试报错 温馨提示: 1.项目放置在全英文路径下,防止项目报错 2.默认使用cpu进行检测,自

Ruihao Wang 65 Dec 27, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022