CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Last update: Mar 10, 2022

Related tags

Overview

CLIP-Indonesian

CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder jointly to project the representation of images and the corresponding text into the same embedding space. The expected outcome is the text embeddings and image embeddings are located near each other.

This repository hosts the code for CLIP-Indonesian, which is a CLIP multimodal model trained on Indonesian data.

For the image encoder, we use VIT, more specifically openai/clip-vit-base-patch32. Meanwhile, for the text encoder, we experimented with two models: IndoBERT Large (indobenchmark/indobert-base-p2) and Indonesian RoBERTa Base (flax-community/indonesian-roberta-base).

Most of the CLIP script is based on HybridCLIP and clip-italian.

Still a work in progress so may not give the best result (yet) :)

clip-indonesian was presented in PyCon ID 2021. You can view the slide deck here.

Dataset

More details about the dataset used can be found here.

Results

The results of the training can be accessed here.

Demo

References

Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S. (2021). Contrastive Language-Image Pre-training for the Italian Language arXiv preprint arXiv:2108.08688.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387.

Hybrid CLIP by the HuggingFace team

Indonesian Roberta Base by Wilson Wongso, Steven Limcorn, Samsul Rahmadani, and Chew Kok Wah

Indonesian Translated Datasets by Samsul Rahmadani

Acknowledgment

All training was done on a TPUv3-8 VM sponsored by TPU Research Cloud.

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Related tags

Overview

CLIP-Indonesian

Dataset

Results

Demo

Links

References

Acknowledgment

Owner

Galuh

A Python 3 package for state-of-the-art statistical dimension reduction methods

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

UT-Sarulab MOS prediction system using SSL models

A Simplied Framework of GAN Inversion

SSD-based Object Detection in PyTorch

PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

3D mesh stylization driven by a text input in PyTorch

GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Finding Donors for CharityML

MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

A python-image-classification web application project, written in Python and served through the Flask Microframework

Benchmarks for Model-Based Optimization

Codes for 'Dual Parameterization of Sparse Variational Gaussian Processes'