INTRODUCTION

This is a modification of the OpenAI-CLIP repo of moein-shariatnia(https://github.com/moein-shariatnia/OpenAI-CLIP).

The current training dataset supports flicker-8k or flicker-30k, and the image encoder supports Resnet50 or ViT(vit_base_patch16_384).

Text encoder supports only DistilBert like moein-shariatnia.

ENVIRONTMENT SETTING

$ virtualenv .venv --python=python3.6
$ source .venv/bin/activate
$ pip install -r requirements.txt

EXECUTTION

Pretrain

$ python3 pretrain.py

Inference

$ python3 inference.py --qeury={YOUR QUERY}

CAUTION

You must set(or check) some options in config.py before pretrain & inference

ex1) dataset("8k" or "30k"): Train dataset(flicker-8k or flicker-30k)

ex2) model_name("resnet50" or "vit_base_patch16_384"): Type of image encoder

ex3) pretrained(True or False): Decide whether to learn by loading pretrain versions of text encoder(DistilBert) and image encoder(resnet50 or ViT)

ex4) batch_size: Set according to the capacity of the machine

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Related tags

Overview

INTRODUCTION

ENVIRONTMENT SETTING

EXECUTTION

CAUTION

Owner

Sangwon Beak

Codes for coreference-aware machine reading comprehension

Input english text, then translate it between languages n times using the Deep Translator Python Library.

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

NLP made easy

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

State of the art faster Natural Language Processing in Tensorflow 2.0 .

Header-only C++ HNSW implementation with python bindings

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Need: Image Search With Python

A library for end-to-end learning of embedding index and retrieval model

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

The source code of HeCo

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Pipeline for training LSA models using Scikit-Learn.

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer