πŸ‡°πŸ‡· Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isn’t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captions’ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAI’s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ν•˜μ˜μ—μ„œ 색상은 μŠ€μΉ΄μ΄λΈ”λ£¨μ΄λ‹€. μƒμ˜μ—μ„œ κΈ°μž₯은 둱이닀. 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” 싀크이닀. ν”„λ¦°νŠΈμ—λŠ” 무지이닀. λ„₯라인은 브이λ„₯이닀. 핏은 λ…Έλ©€
Generated Image image
Caption μ•„μš°ν„°λŠ” 색상이 μΉ΄ν‚€ μ†Œμž¬κ°€ 우븐 핏이 루즈인 μ½”νŠΈμ΄λ‹€. ν•˜μ˜λŠ” 색상이 넀이비 μ†Œμž¬κ°€ λ°λ‹˜ 핏이 μŠ€ν‚€λ‹ˆμΈ 청바지이닀.
Generated Image image
Caption ν•˜μ˜μ—μ„œ κΈ°μž₯은 발λͺ©μ΄λ‹€. 색상은 블루이닀. μΉ΄ν…Œκ³ λ¦¬λŠ” μŠ€μ»€νŠΈμ΄λ‹€. μ†Œμž¬μ—λŠ” λ°λ‹˜μ΄λ‹€. 핏은 μ™€μ΄λ“œμ΄λ‹€. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” μš°λΈμ΄λ‹€.
Generated Image image
Caption μƒμ˜μ—μ„œ κΈ°μž₯은 노멀이닀. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μƒμ˜μ—μ„œ μ„œλΈŒμƒ‰μƒμ€ λΈ”λž™μ΄λ‹€. μƒμ˜μ—μ„œ μΉ΄ν…Œκ³ λ¦¬λŠ” 티셔츠이닀. μƒμ˜μ—μ„œ μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μƒμ˜μ—μ„œ μ†Œμž¬μ—λŠ” 저지이닀. μƒμ˜μ—μ„œ ν”„λ¦°νŠΈμ—λŠ” λ ˆν„°λ§μ΄λ‹€. μƒμ˜μ—μ„œ λ„₯라인은 λΌμš΄λ“œλ„₯이닀. μƒμ˜μ—μ„œ 핏은 λ£¨μ¦ˆμ΄λ‹€.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Models’ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    변경사항

    add) model.py

    ν˜„μˆ˜λ‹˜μ˜ KoCLIP이 DALLE Roberta μ—μ„œ μž‘λ™ν•˜κ²Œλ” μ½”λ“œλ₯Ό μˆ˜μ •ν•œ νŒŒμΌμž…λ‹ˆλ‹€.

    dev branch에 μ‘΄μž¬ν•˜λŠ” model.py λΉ„κ΅ν•˜λ©΄μ„œ μˆ˜μ •μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

    add) generate.ipynb

    KoCLIP이 μž‘λ™ν•˜λŠ”κ²ƒμ„ λ³Ό 수 μžˆλ„λ‘ λ§Œλ“  μ½”λ“œμž…λ‹ˆλ‹€.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    변경사항:

    refactor) clipmodel.py

    • CLIPModel μ΅œμ’… λ²„μ „μœΌλ‘œ μˆ˜μ •
    • clip folder둜 이동

    add) clip/train_clip.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ μ½”λ“œμž…λ‹ˆλ‹€

    add) clip/dataloader.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ dataloader ν•¨μˆ˜μž…λ‹ˆλ‹€.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    변경사항

    modify) loader.py

    • TextImageDatasetμ—μ„œ texts, imageλ₯Ό 뢈러올 λ•Œ, dataκ°€ 없을 경우 λ°œμƒν•˜λŠ” μ—λŸ¬ 처리
    • skip_sample ν•¨μˆ˜λ₯Ό ν™œμš©ν•˜μ—¬ errorκ°€ λ°œμƒν•  경우, random ν˜Ήμ€ λ‹€μŒ index둜 λ³€ν™˜ν•˜μ—¬ skip
    • κΈ°μ‘΄ train_dalle_gpt_roberta.pyλ₯Ό λ°”νƒ•μœΌλ‘œ μˆ˜μ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

Gopalika Sharma 1 Nov 08, 2021
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
A package related to building quasi-fibration symmetries

qf A package related to building quasi-fibration symmetries. If you'd like to learn more about how it works, see the brief explanation and References

Paolo Boldi 1 Dec 01, 2021
SAN for Product Attributes Prediction

SAN Heterogeneous Star Graph Attention Network for Product Attributes Prediction This repository contains the official PyTorch implementation for ADVI

Xuejiao Zhao 9 Dec 12, 2022
Semi-Supervised Learning with Ladder Networks in Keras. Get 98% test accuracy on MNIST with just 100 labeled examples !

Semi-Supervised Learning with Ladder Networks in Keras This is an implementation of Ladder Network in Keras. Ladder network is a model for semi-superv

Divam Gupta 101 Sep 07, 2022
A PyTorch implementation of EfficientDet.

A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights

Ross Wightman 1.4k Jan 07, 2023
TC-GNN with Pytorch integration

TC-GNN (Running Sparse GNN on Dense Tensor Core on Ampere GPU) Cite this project and paper. @inproceedings{TC-GNN, title={TC-GNN: Accelerating Spars

YUKE WANG 19 Dec 01, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Reproducing code of hair style replacement method from Barbershorp.

Barbershorp Reproducing code of hair style replacement method from Barbershorp. Also reproduces II2S, an improved version of Image2StyleGAN. Requireme

1 Dec 24, 2021
Dilated Convolution with Learnable Spacings PyTorch

Dilated-Convolution-with-Learnable-Spacings-PyTorch Ismail Khalfaoui Hassani Dilated Convolution with Learnable Spacings (abbreviated to DCLS) is a no

15 Dec 09, 2022
Piotr - IoT firmware emulation instrumentation for training and research

Piotr: Pythonic IoT exploitation and Research Introduction to Piotr Piotr is an emulation helper for Qemu that provides a convenient way to create, sh

Damien Cauquil 51 Nov 09, 2022
Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

PyEMD: Fast EMD for Python PyEMD is a Python wrapper for Ofir Pele and Michael Werman's implementation of the Earth Mover's Distance that allows it to

William Mayner 433 Dec 31, 2022
Flower - A Friendly Federated Learning Framework

Flower - A Friendly Federated Learning Framework Flower (flwr) is a framework for building federated learning systems. The design of Flower is based o

Adap 1.8k Jan 01, 2023
NumQMBasic - A mini-course offered to Undergrad physics students

The best way to use this material is by forking it by click the Fork button at the top, right corner. Then you will get your own copy to play with! Th

Raghu 35 Dec 05, 2022
Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization Official PyTorch implementation of the Fishr regularization for out-of-dist

62 Dec 22, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
Landmarks Recogntion Web application using Streamlit.

Landmark Recognition Web-App using Streamlit Watch Tutorial for this project Source Trained model landmarks_classifier_asia_V1/1 is taken from the Ten

Kushal Bhavsar 5 Dec 12, 2022
This is the repository for the paper "Have I done enough planning or should I plan more?"

Metacognitive Learning Tool box https://re.is.mpg.de What Is This? This repository contains two modules used to analyse metacognitive learning in huma

0 Dec 01, 2021