๐Ÿ‡ฐ๐Ÿ‡ท Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language modelโ€™s token embedding layer and position embedding layer as DALLEโ€™s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isnโ€™t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captionsโ€™ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAIโ€™s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ํ•˜์˜์—์„œ ์ƒ‰์ƒ์€ ์Šค์นด์ด๋ธ”๋ฃจ์ด๋‹ค. ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋กฑ์ด๋‹ค. ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์‹คํฌ์ด๋‹ค. ํ”„๋ฆฐํŠธ์—๋Š” ๋ฌด์ง€์ด๋‹ค. ๋„ฅ๋ผ์ธ์€ ๋ธŒ์ด๋„ฅ์ด๋‹ค. ํ•์€ ๋…ธ๋ฉ€
Generated Image image
Caption ์•„์šฐํ„ฐ๋Š” ์ƒ‰์ƒ์ด ์นดํ‚ค ์†Œ์žฌ๊ฐ€ ์šฐ๋ธ ํ•์ด ๋ฃจ์ฆˆ์ธ ์ฝ”ํŠธ์ด๋‹ค. ํ•˜์˜๋Š” ์ƒ‰์ƒ์ด ๋„ค์ด๋น„ ์†Œ์žฌ๊ฐ€ ๋ฐ๋‹˜ ํ•์ด ์Šคํ‚ค๋‹ˆ์ธ ์ฒญ๋ฐ”์ง€์ด๋‹ค.
Generated Image image
Caption ํ•˜์˜์—์„œ ๊ธฐ์žฅ์€ ๋ฐœ๋ชฉ์ด๋‹ค. ์ƒ‰์ƒ์€ ๋ธ”๋ฃจ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์Šค์ปคํŠธ์ด๋‹ค. ์†Œ์žฌ์—๋Š” ๋ฐ๋‹˜์ด๋‹ค. ํ•์€ ์™€์ด๋“œ์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์šฐ๋ธ์ด๋‹ค.
Generated Image image
Caption ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋…ธ๋ฉ€์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์ƒ์˜์—์„œ ์„œ๋ธŒ์ƒ‰์ƒ์€ ๋ธ”๋ž™์ด๋‹ค. ์ƒ์˜์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ‹ฐ์…”์ธ ์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ์žฌ์—๋Š” ์ €์ง€์ด๋‹ค. ์ƒ์˜์—์„œ ํ”„๋ฆฐํŠธ์—๋Š” ๋ ˆํ„ฐ๋ง์ด๋‹ค. ์ƒ์˜์—์„œ ๋„ฅ๋ผ์ธ์€ ๋ผ์šด๋“œ๋„ฅ์ด๋‹ค. ์ƒ์˜์—์„œ ํ•์€ ๋ฃจ์ฆˆ์ด๋‹ค.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Modelsโ€™ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    add) model.py

    ํ˜„์ˆ˜๋‹˜์˜ KoCLIP์ด DALLE Roberta ์—์„œ ์ž‘๋™ํ•˜๊ฒŒ๋” ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

    dev branch์— ์กด์žฌํ•˜๋Š” model.py ๋น„๊ตํ•˜๋ฉด์„œ ์ˆ˜์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    add) generate.ipynb

    KoCLIP์ด ์ž‘๋™ํ•˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    ๋ณ€๊ฒฝ์‚ฌํ•ญ:

    refactor) clipmodel.py

    • CLIPModel ์ตœ์ข… ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •
    • clip folder๋กœ ์ด๋™

    add) clip/train_clip.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค

    add) clip/dataloader.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ dataloader ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    modify) loader.py

    • TextImageDataset์—์„œ texts, image๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, data๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ ์ฒ˜๋ฆฌ
    • skip_sample ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ error๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ, random ํ˜น์€ ๋‹ค์Œ index๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ skip
    • ๊ธฐ์กด train_dalle_gpt_roberta.py๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

CLIN-X (CLIN-X-ES) & (CLIN-X-EN) This repository holds the companion code for the system reported in the paper: "CLIN-X: pre-trained language models a

Bosch Research 4 Dec 05, 2022
EXplainable Artificial Intelligence (XAI)

EXplainable Artificial Intelligence (XAI) This repository includes the codes for different projects on eXplainable Artificial Intelligence (XAI) by th

4 Nov 28, 2022
Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Streamlit Demo: The Udacity Self-driving Car Image Browser This project demonstrates the Udacity self-driving-car dataset and YOLO object detection in

Streamlit 992 Jan 04, 2023
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

52 Dec 23, 2022
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021
202 Jan 06, 2023
X-VLM: Multi-Grained Vision Language Pre-Training

X-VLM: learning multi-grained vision language alignments Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts. Yan Zeng, Xi

Yan Zeng 286 Dec 23, 2022
FeTaQA: Free-form Table Question Answering

FeTaQA: Free-form Table Question Answering FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form

Language, Information, and Learning at Yale 40 Dec 13, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Enhancing Column Generation by a Machine-Learning-BasedPricing Heuristic for Graph Coloring

Enhancing Column Generation by a Machine-Learning-BasedPricing Heuristic for Graph Coloring (to appear at AAAI 2022) We propose a machine-learning-bas

YunzhuangS 2 May 02, 2022
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Fernando Pรฉrez-Garcรญa 1.6k Jan 06, 2023
Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Contour-guided Image Completion with Perceptual Grouping Authors Morteza Rezanejad*, Sidharth Gupta*, Chandra Gummaluru, Ryan Marten, John Wilder, Mic

Sid Gupta 6 Dec 27, 2022
A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Hands-on-Machine-Learning ็›ฎ็š„ ่ฟ™ไปฝ็ฌ”่ฎฐๆ—จๅœจๅธฎๅŠฉไธญๆ–‡ๅญฆไน ่€…ไปฅไธ€็ง่พƒๅฟซ่พƒ็ณป็ปŸ็š„ๆ–นๅผๅ…ฅ้—จๆœบๅ™จๅญฆไน ๏ผŒ ๆ˜ฏๅœจๅญฆไน Hands-on Machine Learning with Scikit-Learn and TensorFlow่ฟ™ๆœฌไนฆ็š„ ๆ—ถๅ€™ๅš็š„ไธชไบบ็ฌ”่ฎฐ: ๆญค้กน็›ฎ็š„ๅฏๅ–ไน‹ๅค„ ๅŽŸไนฆ็š„

Baymax 1.5k Dec 21, 2022
Distance Encoding for GNN Design

Distance-encoding for GNN design This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper: Dista

172 Nov 08, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar: Parallel Training of Large Language Models via a Chunk-based Memory Management Meeting PatrickStar Pre-Trained Models (PTM) are becoming

Tencent 633 Dec 28, 2022
Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works

GDAP Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works Environment Python (verified: v3.8) CUDA

45 Oct 29, 2022
Deep learning image registration library for PyTorch

TorchIR: Pytorch Image Registration TorchIR is a image registration library for deep learning image registration (DLIR). I have integrated several ide

Bob de Vos 40 Dec 16, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
Matplotlib Image labeller for classifying images

mpl-image-labeller Use Matplotlib to label images for classification. Works anywhere Matplotlib does - from the notebook to a standalone gui! For more

Ian Hunt-Isaak 5 Sep 24, 2022
4K videos with annotated masks in our ICCV2021 paper 'Internal Video Inpainting by Implicit Long-range Propagation'.

Annotated 4K Videos paper | project website | code | demo video 4K videos with annotated object masks in our ICCV2021 paper: Internal Video Inpainting

Tengfei Wang 21 Nov 05, 2022