MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

Official repository of the paper Privacy-friendly Synthetic Data for the Development of Face Morphing Attack Detectors

Implementation of "With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021" in PyTorch

Facebook AI Image Similarity Challenge: Descriptor Track

FaRL for Facial Representation Learning

Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

This repo is for segmentation of T2 hyp regions in gliomas.

PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

Deep Q Learning with OpenAI Gym and Pokemon Showdown

COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Machine learning notebooks in different subjects optimized to run in google collaboratory

Artificial intelligence technology inferring issues and logically supporting facts from raw text

PyTorch Implementation of CycleGAN and SSGAN for Domain Transfer (Minimal)

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

Baseline for the Spoofing-aware Speaker Verification Challenge 2022

The easiest tool for extracting radiomics features and training ML models on them.

Hybrid Neural Fusion for Full-frame Video Stabilization

Production First and Production Ready End-to-End Speech Recognition Toolkit

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Pytorch implementation of MalConv