A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Last update: Dec 26, 2022

Related tags

Overview

CLIP4CMR

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

The original data and pre-calculated CLIP features are available at here. The train.pkl and test.pkl include image pixel features and text id features, and the clip_train.pkl and clip_test.pkl include 1024-dimensional image and text features.

Owner

GitHub Repository

BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Book My Show PC What is this? Movie Ticket Reservation App made with Tkinter. Tk

3 Dec 09, 2022

Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021

[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

When Does Self-Supervision Help Graph Convolutional Networks? PyTorch implementation for When Does Self-Supervision Help Graph Convolutional Networks?

106 Nov 11, 2022

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

SANDS This is an annonymous repository containing code and data necessary to reproduce the results published in "Semi-supervised Stance Detection of T

2 Sep 22, 2022

Import Python modules from dicts and JSON formatted documents.

Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important

1 Sep 07, 2022

Improving Non-autoregressive Generation with Mixup Training

MIST Training MIST TRAIN_FILE=/your/path/to/train.json VALID_FILE=/your/path/to/valid.json OUTPUT_DIR=/your/path/to/save_checkpoints CACHE_DIR=/your/p

7 Nov 22, 2022

BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

346 Dec 30, 2022

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

"NTU RGB+D" Action Recognition Dataset "NTU RGB+D 120" Action Recognition Dataset "NTU RGB+D" is a large-scale dataset for human action recognition. I

578 Dec 30, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

48 Aug 19, 2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Related tags

Overview

CLIP4CMR

Owner

BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Repo for flood prediction using LSTMs and HAND

[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

Import Python modules from dicts and JSON formatted documents.

Improving Non-autoregressive Generation with Mixup Training

BC3407-Group-5-Project - BC3407 Group Project With Python

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

End-to-End Referring Video Object Segmentation with Multimodal Transformers

Source Code For Template-Based Named Entity Recognition Using BART

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

This is an implementation of PIFuhd based on Pytorch

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Python package for downloading ECMWF reanalysis data and converting it into a time series format.

Official PaddlePaddle implementation of Paint Transformer

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.