It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

  1. Download CLIP image from repo
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
  1. Load standard CLIP model, image, text on cpu
import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]
  1. Create CLIP-ONNX object to convert model to onnx
from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()
  1. Use for standard CLIP API. Batch inference
image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments
  • Can't use CUDAExecutionProvider

    Can't use CUDAExecutionProvider

    Hey, I'm trying to use the code on GPU and I encountered 2 problems:

    1. when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

    I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

    1. After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

    2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

    I can't figure out what is the problem. Any help?

    opened by YoadTew 13
  • Performance is inconsistent with the original model

    Performance is inconsistent with the original model

    Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

    model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
    
    image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
    text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
    
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()
    
    print("Label probs:", probs) 
    

    The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

    However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

    Could you help me with this? Thanks!

    opened by Cestlaviez 5
  • Error on installing the torch version in requirements.txt

    Error on installing the torch version in requirements.txt

    pip install git+https://github.com/Lednik7/CLIP-ONNX.git

    ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
    ERROR: No matching distribution found for torch==1.11.0+cu113
    

    python version is 3.7.13

    opened by dingusagar 2
  • ERROR: No matching distribution found for onnxruntime==1.11

    ERROR: No matching distribution found for onnxruntime==1.11

    Hi, Thanks for the great work!

    I am having this error when I try to install the package.

    ERROR: No matching distribution found for onnxruntime==1.11

    Maybe we can update the requirements.txt?

    opened by wanliAlex 1
  • Replace the operator of

    Replace the operator of "torch.einsum"

    q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

    @Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

    opened by zhangnju 2
Owner
Gerasimov Maxim
16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education
Gerasimov Maxim
IDA file loader for UF2, created for the DEFCON 29 hardware badge

UF2 Loader for IDA The DEFCON 29 badge uses the UF2 bootloader, which conveniently allows you to dump and flash the firmware over USB as a mass storag

Kevin Colley 6 Feb 08, 2022
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

Zhilu Zhang 53 Dec 20, 2022
Nest - A flexible tool for building and sharing deep learning modules

Nest - A flexible tool for building and sharing deep learning modules Nest is a flexible deep learning module manager, which aims at encouraging code

ZhouYanzhao 41 Oct 10, 2022
AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

Md. Rakibul Islam 1 Jan 13, 2022
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

Gerasimov Maxim 93 Dec 20, 2022
Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

IIGROUP 6 Sep 21, 2022
Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv] Setup Python 3.8.0 pip install -r req.txt Mujoco 200 license Main Files main.

Brennan Gebotys 1 Oct 10, 2022
The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

ASMA-GAN Anisotropic Stroke Control for Multiple Artists Style Transfer Proceedings of the 28th ACM International Conference on Multimedia The officia

Six_God 146 Nov 21, 2022
LBBA-boosted WSOD

LBBA-boosted WSOD Summary Our code is based on ruotianluo/pytorch-faster-rcnn and WSCDN Sincerely thanks for your resources. Newer version of our code

Martin Dong 20 Sep 19, 2022
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
The implementation of FOLD-R++ algorithm

FOLD-R-PP The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task. Inst

13 Dec 23, 2022
Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

Victor Delvigne 21 Dec 23, 2022
Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

Youssef Chafiqui 2 Jan 07, 2022
Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Debabrata Mahapatra 40 Dec 24, 2022
Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.

clip-text-decoder Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script. Example Predi

Frank Odom 36 Dec 21, 2022
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

Conchylicultor 2.9k Dec 28, 2022
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

Kate Rakelly 516 Jan 05, 2023
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
Bianace Prediction Pytorch Model

Bianace Prediction Pytorch Model Main Results ETHUSDT from 2021-01-01 00:00:00 t

RoyYang 4 Jul 20, 2022
Official implementation of "Learning Not to Reconstruct" (BMVC 2021)

Official PyTorch implementation of "Learning Not to Reconstruct Anomalies" This is the implementation of the paper "Learning Not to Reconstruct Anomal

Marcella Astrid 13 Dec 04, 2022