Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Last update: Dec 30, 2022

Related tags

Overview

Blended Diffusion for Text-driven Editing of Natural Images

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried

Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Code availability

Full code will be released soon.

Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Related tags

Overview

Blended Diffusion for Text-driven Editing of Natural Images

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Code availability

Owner

Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation

A DCGAN to generate anime faces using custom mined dataset

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

Repo for EchoVPR: Echo State Networks for Visual Place Recognition

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

DIRL: Domain-Invariant Representation Learning

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Container : Context Aggregation Network

Session-aware Item-combination Recommendation with Transformer Network

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation

Awesome Remote Sensing Toolkit based on PaddlePaddle.

A tool for making map images from OpenTTD save games

Code for Active Learning at The ImageNet Scale.

IsoGCN code for ICLR2021

git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]