Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Last update: Dec 28, 2022

Overview

NÜWA - Pytorch (wip)

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be populated in the case that Microsoft does not open source the code by end of December. It may also contain an extension into video and audio, using a dual decoder approach.

DeepReader

Citations

@misc{wu2021nuwa,
    title   = {N\"UWA: Visual Synthesis Pre-training for Neural visUal World creAtion}, 
    author  = {Chenfei Wu and Jian Liang and Lei Ji and Fan Yang and Yuejian Fang and Daxin Jiang and Nan Duan},
    year    = {2021},
    eprint  = {2111.12417},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

Question about generated videos?

There are a lot of negative numbers and very small decimals (like 5e-1). But the loss degrades normally when training. Is that a normal situation? How can I make the result visible?

opened by Fitzwong 0
Why the video does not pass through the encoder?

Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
I have a question as follows: In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder. The Decoder provided by you is as following. . It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper. The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

Looking forward to your reply! Thanks!

opened by Wang-Xiaodong1899 0
Questions about function forward() in NUWA please.
I'm confused me that, in function forward() of class NUWA, the ground-truth video is fed to transformer and calculate the output video, which is different from function generate().

frame_embeddings = self.video_transformer( frame_embeddings, # calculated from ground-truth video context = text_embeds, context_mask = text_mask )

So when training NUWA, the loss comes from logits. But the logits are not only from text, but ground-truth video (only one transformer layer, different from the auto-regressive model in generate function). Is that some kind of cheating when training? Or should I generate logits in the same way as in generate(), and then calculate loss to train?
opened by Fitzwong 1
Type of dataset for training VQ-GAN

Hi,

First, thanks a lot for the amazing work! I have one question regarding the training of the VQ-GAN, do you recommend training it on a dataset similar to the dataset the nuwa model will be trained? What I mean is, if I want to train nuwa to generate sport videos based on text, do I need to also train the VQ-GAN on a sport dataset?

Thanks a lot

opened by antonibigata 0
Pseudocode for 3DNA?

me no comprendai le complex einops 😢

Can someone give the 3DNA pseudocode to illustrate what's going on 🤗

(Also how did lucidrains bang out thousands of lines of code in a few weeks - is he confirmed to be human? 🤔)

opened by neel04 4

Releases(0.7.7a)

0.7.7a(Aug 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.7.7(Aug 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.7.6(Apr 28, 2022)

Source code(tar.gz)
Source code(zip)
0.7.5(Apr 28, 2022)

Source code(tar.gz)
Source code(zip)
0.7.4(Apr 27, 2022)

Source code(tar.gz)
Source code(zip)
0.7.3(Apr 22, 2022)

Source code(tar.gz)
Source code(zip)
0.7.2(Apr 7, 2022)

Source code(tar.gz)
Source code(zip)
0.7.1(Mar 24, 2022)

Source code(tar.gz)
Source code(zip)
0.7.0(Mar 24, 2022)

Source code(tar.gz)
Source code(zip)
0.6.4(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.3(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.2(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.1(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.0(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.5.15(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.14(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.12(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.11(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.10(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.9(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.8(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.7(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.6(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.5(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.4(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.3(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.2(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.1(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.0(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.4.33(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

88 Dec 30, 2022

Deep Latent Force Models

Deep Latent Force Models This repository contains a PyTorch implementation of the deep latent force model (DLFM), presented in the paper, Compositiona

5 Oct 26, 2022

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

1.1k Dec 27, 2022

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

This project has moved 🏠 We heard your feedback! This repo has been deprecated and each project has moved to a new home in a repo scoped by API and p

970 Nov 28, 2022

CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

4 Sep 01, 2022

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022

Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

5 Mar 21, 2022

Advanced Signal Processing Notebooks and Tutorials

Advanced Digital Signal Processing Notebooks and Tutorials Prof. Dr. -Ing. Gerald Schuller Jupyter Notebooks and Videos: Renato Profeta Applied Media

115 Dec 13, 2022

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Back to Event Basics: SSL of Image Reconstruction for Event Cameras Minimal code for Back to Event Basics: Self-Supervised Learning of Image Reconstru

42 Dec 26, 2022

Efficient Deep Learning Systems course

Efficient Deep Learning Systems This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Sc

173 Dec 29, 2022

Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

Principled S2R Dehazing This repository contains the official implementation for PSD Framework introduced in the following paper: PSD: Principled Synt

78 Dec 30, 2022

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

3.4k Jan 07, 2023

Boston House Prediction Valuation Tool

Boston-House-Prediction-Valuation-Tool From Below Anlaysis The Valuation Tool is Designed Correlation Matrix Regrssion Analysis Between Target Vs Pred

0 Sep 09, 2022

General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

154 Dec 21, 2022

A DeepStack custom model for detecting common objects in dark/night images and videos.

DeepStack_ExDark This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API for d

98 Dec 24, 2022

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Hand Biomechanical Constraints Pytorch Unofficial PyTorch reimplementation of Hand-Biomechanical-Constraints (ECCV2020). This project reimplement foll

59 Dec 20, 2022

Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022

Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

SAGA Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection Please refer to the Jupyter notebook (Example.ipynb) for an example of using t

9 Dec 28, 2022

GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

2 Feb 10, 2021

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Light-SERNet This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion

29 Nov 12, 2022

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Related tags

Overview

NÜWA - Pytorch (wip)

Citations

Comments

Question about generated videos?

Why the video does not pass through the encoder?

Questions about function forward() in NUWA please.

Type of dataset for training VQ-GAN

Pseudocode for 3DNA?

Releases(0.7.7a)

0.7.7a(Aug 14, 2022)

0.7.7(Aug 14, 2022)

0.7.6(Apr 28, 2022)

0.7.5(Apr 28, 2022)

0.7.4(Apr 27, 2022)

0.7.3(Apr 22, 2022)

0.7.2(Apr 7, 2022)

0.7.1(Mar 24, 2022)

0.7.0(Mar 24, 2022)

0.6.4(Mar 15, 2022)

0.6.3(Mar 15, 2022)

0.6.2(Mar 15, 2022)

0.6.1(Mar 15, 2022)

0.6.0(Mar 15, 2022)

0.5.15(Mar 12, 2022)

0.5.14(Mar 12, 2022)

0.5.12(Mar 12, 2022)

0.5.11(Mar 12, 2022)

0.5.10(Mar 11, 2022)

0.5.9(Mar 11, 2022)

0.5.8(Mar 11, 2022)

0.5.7(Mar 11, 2022)

0.5.6(Mar 11, 2022)

0.5.5(Mar 11, 2022)

0.5.4(Mar 11, 2022)

0.5.3(Mar 10, 2022)

0.5.2(Mar 10, 2022)

0.5.1(Mar 10, 2022)

0.5.0(Mar 10, 2022)

0.4.33(Mar 10, 2022)