Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Overview

glide-finetune

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset.

Installation

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
python3 -m venv .venv # create a virtual environment to keep global install clean.
source .venv/bin/activate
(.venv) # optionally install pytorch manually for your own specific env first...
(.venv) python -m pip install -r requirements.txt

Usage

(.venv) python glide-finetune.py 
    --data_dir=./data \
    --batch_size=1 \
    --grad_acc=1 \
    --guidance_scale=4.0 \
    --learning_rate=2e-5 \
    --dropout=0.1 \
    --timestep_respacing=1000 \
    --side_x=64 \
    --side_y=64 \
    --resume_ckpt='' \
    --checkpoints_dir='./glide_checkpoints/' \
    --use_fp16 \
    --device='' \
    --freeze_transformer \
    --freeze_diffusion \
    --weight_decay=0.0 \
    --project_name='glide-finetune'

Known issues:

  • batching isn't handled in the dataloader
  • NaN/Inf errors
  • Resizing doesn't handle non-square aspect ratios properly
  • some of the code is messy, needs refactoring.
Comments
  • Fixed a couple of minor issues

    Fixed a couple of minor issues

    • Pinned webdataset version to work with python 3.7 which is the version being used in Colab, Kaggle. A new version of this module is releaed few days back which only works with 3.8/9
    • Fixed an issue with data_dir arg not getting picked up.
    opened by vanga 1
  • Fix NameError when using --data_dir

    Fix NameError when using --data_dir

    Hello and thank you for your great work.

    Right now using a local data folder with --data_dir results in

    Traceback (most recent call last):
      File "/content/glide-finetune/train_glide.py", line 292, in <module>
        data_dir=data_dir,
    NameError: name 'data_dir' is not defined
    

    This PR fixes that.

    opened by tillfalko 0
  • mention mpi4py dependency

    mention mpi4py dependency

    mpi4py installation will fail unless the user has this package installed. Since MPI is not a ubiquitous dependency it should probably be mentioned. Edit: Since torch==1.10.1 is a requirement, and torch versions come with their own cuda versions (torch 1.10.1 uses cuda 10.2), I don't see a reason not to just include bitsandbytes-cuda102 in requirements.txt.

    $ py -m venv .venv
    $ source .venv/bin/activate
    $ pip install torch==1.10.1
    Collecting torch==1.10.1
      Downloading torch-1.10.1-cp39-cp39-manylinux1_x86_64.whl (881.9 MB)
         |████████████████████████████████| 881.9 MB 15 kB/s
    Collecting typing-extensions
      Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
    Installing collected packages: typing-extensions, torch
    Successfully installed torch-1.10.1 typing-extensions-4.0.1
    $ py -c "import torch; print(torch.__version__)"
    1.10.1+cu102
    
    opened by tillfalko 0
  • Fixed half precision optimizer bug

    Fixed half precision optimizer bug

    Problem

    In half precision, after the first iteration nan values start appearing regardless of input data or gradients since the adam optimizer breaks in float16. The discussion for that can be viewed here.

    Solution

    This can be fixed by setting the eps variable to 1e-4 instead of the default 1e-8. This is the only thing this pr does

    opened by isamu-isozaki 0
  • Training on half precision leads to nan values

    Training on half precision leads to nan values

    I was training my model and I noticed that after just the first iteration I was running into nan values. As it turns out my gradients and input values/images were all normal but the adam optimizer by pytorch does has some weird behavior on float16 precision where it produces nans probably because of a divide by 0 error. A discussion can be found below

    https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4

    I hear changing the epison parameter for the adam weights parameter when on half precisions works but I haven't tested it yet. Will make one once I tested.

    And also let me say thanks for this repo. I wanted to fine tune the glide model and this made it so much easier.

    opened by isamu-isozaki 1
  • Where is the resume_ckpt

    Where is the resume_ckpt

    Hi, thanks for your job.

    I noticed to finetune the glide, we should have a base_model, namely "resume_ckpt". --resume_ckpt 'ckpt_to_resume_from.pt'
    Where can we get this model? Because I find Glide also didn't provide any checkpoint. Thanks for your help.

    opened by zhaobingbingbing 0
Releases(v0.0.1)
  • v0.0.1(Feb 20, 2022)

    Having some experience with finetuning GLIDE on laion/alamy, etc. I think this code works great now and hope as many people can use it as possible. Please file bugs - I know there may be a few.

    New additions:

    • dataloader for LAION400M
    • dataloader for alamy
    • train the upsample model instead of just the base model
    • (early) code for training the released noisy CLIP. still a WIP.
    Source code(tar.gz)
    Source code(zip)
Owner
Clay Mullis
Software engineer working with multi-modal deep learning.
Clay Mullis
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

24 Dec 31, 2022
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022
Experiments for Operating Systems Lab (ETCS-352)

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Deekshant Wadhwa 0 Sep 06, 2022
Out of Distribution Detection on Natural Adversarial Examples

OOD-on-NAE Research project on out of distribution detection for the Computer Vision course by Prof. Rob Fergus (CSCI-GA 2271) Paper out on arXiv - ht

Anugya 1 Jun 08, 2022
Deep Learning Slide Captcha

滑动验证码深度学习识别 本项目使用深度学习 YOLOV3 模型来识别滑动验证码缺口,基于 https://github.com/eriklindernoren/PyTorch-YOLOv3 修改。 只需要几百张缺口标注图片即可训练出精度高的识别模型,识别效果样例: 克隆项目 运行命令: git cl

Python3WebSpider 55 Jan 02, 2023
My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

Kohei Ozaki 18 Aug 19, 2022
The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

UEPNet (ICCV2021 Poster Presentation) This repository contains codes for the official implementation in PyTorch of UEPNet as described in Uniformity i

Tencent YouTu Research 15 Dec 14, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
[CoRL 2021] A robotics benchmark for cross-embodiment imitation.

x-magical x-magical is a benchmark extension of MAGICAL specifically geared towards cross-embodiment imitation. The tasks still provide the Demo/Test

Kevin Zakka 36 Nov 26, 2022
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation Our paper is accepted by ICCV2021. Picture: Overview of the proposed Plug-an

Yunfei Liu 32 Dec 10, 2022
bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

Code Submission for: Bio-inspired Min-Nets Improve the Performance and Robustness of Deep Networks Run with docker To build a docker environment, chan

0 Dec 09, 2021
Toolchain to build Yoshi's Island from source code

Project-Y Toolchain to build Yoshi's Island (J) V1.0 from source code, by MrL314 Last updated: September 17, 2021 Setup To begin, download this toolch

MrL314 19 Apr 18, 2022
mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility function

Facebook Research 724 Jan 04, 2023
banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services. This library is developed by Bandit ML and ex-authors of Facebook's app

Bandit ML 51 Dec 22, 2022
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images In this paper, we present an effective Dynamic Enhancement Anchor

13 Dec 09, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

Google Interns 50 Dec 21, 2022
Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

SuperSuit introduces a collection of small functions which can wrap reinforcement learning environments to do preprocessing ('microwrappers'). We supp

Farama Foundation 357 Jan 06, 2023
SFD implement with pytorch

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector Description Meanwhile train hand

Jun Li 251 Dec 22, 2022