It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Last update: Dec 20, 2022

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

Download CLIP image from repo

!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

Load standard CLIP model, image, text on cpu

import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]

Create CLIP-ONNX object to convert model to onnx

from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()

Use for standard CLIP API. Batch inference

image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments

Can't use CUDAExecutionProvider
Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?
opened by YoadTew 13
Performance is inconsistent with the original model
Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224] text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77] image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy() print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!
opened by Cestlaviez 5

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

opened by dingusagar 2

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?

opened by wanliAlex 1
updated and added information

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages

opened by Lednik7 0
Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

opened by zhangnju 2

Releases(1.2)

1.2(May 3, 2022)

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages
Source code(tar.gz)
Source code(zip)
1.0(May 3, 2022)

Works but with crutches
Source code(tar.gz)
Source code(zip)

Owner

Gerasimov Maxim

16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education

GitHub Repository

IDA file loader for UF2, created for the DEFCON 29 hardware badge

UF2 Loader for IDA The DEFCON 29 badge uses the UF2 bootloader, which conveniently allows you to dump and flash the firmware over USB as a mass storag

6 Feb 08, 2022

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

53 Dec 20, 2022

Nest - A flexible tool for building and sharing deep learning modules

Nest - A flexible tool for building and sharing deep learning modules Nest is a flexible deep learning module manager, which aims at encouraging code

41 Oct 10, 2022

AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

1 Jan 13, 2022

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

93 Dec 20, 2022

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

6 Sep 21, 2022

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv] Setup Python 3.8.0 pip install -r req.txt Mujoco 200 license Main Files main.

1 Oct 10, 2022

The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

ASMA-GAN Anisotropic Stroke Control for Multiple Artists Style Transfer Proceedings of the 28th ACM International Conference on Multimedia The officia

146 Nov 21, 2022

LBBA-boosted WSOD

LBBA-boosted WSOD Summary Our code is based on ruotianluo/pytorch-faster-rcnn and WSCDN Sincerely thanks for your resources. Newer version of our code

20 Sep 19, 2022

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

131 Dec 13, 2022

The implementation of FOLD-R++ algorithm

FOLD-R-PP The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task. Inst

13 Dec 23, 2022

Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

21 Dec 23, 2022

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

2 Jan 07, 2022

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

40 Dec 24, 2022

Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.

clip-text-decoder Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script. Example Predi

36 Dec 21, 2022

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

2.9k Dec 28, 2022

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

516 Jan 05, 2023

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

47 Dec 26, 2022

Bianace Prediction Pytorch Model

Bianace Prediction Pytorch Model Main Results ETHUSDT from 2021-01-01 00:00:00 t

4 Jul 20, 2022

Official implementation of "Learning Not to Reconstruct" (BMVC 2021)

Official PyTorch implementation of "Learning Not to Reconstruct Anomalies" This is the implementation of the paper "Learning Not to Reconstruct Anomal

13 Dec 04, 2022

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Related tags

Overview

CLIP-ONNX

Usage

Example in 3 steps

Examples

Comments

Can't use CUDAExecutionProvider

Performance is inconsistent with the original model

Error on installing the torch version in requirements.txt

ERROR: No matching distribution found for onnxruntime==1.11

updated and added information

Replace the operator of "torch.einsum"

Releases(1.2)

1.2(May 3, 2022)

1.0(May 3, 2022)

Owner

Gerasimov Maxim

IDA file loader for UF2, created for the DEFCON 29 hardware badge

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Nest - A flexible tool for building and sharing deep learning modules

AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

LBBA-boosted WSOD

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

The implementation of FOLD-R++ algorithm

Public Models considered for emotion estimation from EEG

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Bianace Prediction Pytorch Model

Official implementation of "Learning Not to Reconstruct" (BMVC 2021)