Script and models for clustering LAION-400m CLIP embeddings.

Last update: Oct 04, 2022

Related tags

Overview

clustering-laion400m

Script and models for clustering LAION-400m CLIP embeddings.

Models were fit on the first million or so image embeddings. A subjective description of what the labels appear to be is included in cluster-labels.txt along with counts for the first million or so embeddings (aka the first file).

Precomputed labels are here: https://archive.org/details/laion400m-64-clustering-labels.tar

Run Fit Clusters.ipynb to reproduce the labels or create your own clusters / models. This requires the CLIP embeddings from the LAION 400m open dataset, which can be found here: https://laion.ai/laion-400-open-dataset/

Owner

Peter Baylies

GitHub Repository

Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

140 Dec 06, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 06, 2023

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others

1 Jan 13, 2022

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

1.2k Jan 08, 2023

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

1 Nov 19, 2021

Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

5.6k Jan 04, 2023

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

8 Dec 16, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

612 Jan 04, 2023

The Classical Language Toolkit

Notice: This Git branch (dev) contains the CLTK's upcoming major release (v. 1.0.0). See https://github.com/cltk/cltk/tree/master and https://docs.clt

754 Jan 09, 2023

PyTorch impelementations of BERT-based Spelling Error Correction Models.

PyTorch impelementations of BERT-based Spelling Error Correction Models

209 Dec 30, 2022

Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

66 Jul 05, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021

Script and models for clustering LAION-400m CLIP embeddings.

Related tags

Overview

clustering-laion400m

Owner

Peter Baylies

Build Text Rerankers with Deep Language Models

Clone a voice in 5 seconds to generate arbitrary speech in real-time

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Chinese version of GPT2 training code, using BERT tokenizer.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

The Classical Language Toolkit

PyTorch impelementations of BERT-based Spelling Error Correction Models.

Linear programming solver for paper-reviewer matching and mind-matching

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Final Project Bootcamp Zero

FactSumm: Factual Consistency Scorer for Abstractive Summarization

Journey is a NLP-Powered Developer assistant

An open-source NLP library: fast text cleaning and preprocessing.

SciBERT is a BERT model trained on scientific text.

An implementation of WaveNet with fast generation

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset