Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

Last update: Dec 08, 2022

Related tags

Overview

CoSMo.pytorch

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung Han. *(denotes equal contribution)

Presented at CVPR2021

Paper | Poster | 5 min Video

⚙️ Setup

Python: python3.7

📦 Install required packages

Install torch and torchvision via following command (CUDA10)

pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html

Install other packages

pip install -r requirements.txt

📂 Dataset

Download the FashionIQ dataset by following the instructions on this link.

We have set the default path for FashionIQ datasets in data/fashionIQ.py as _DEFAULT_FASHION_IQ_DATASET_ROOT = '/data/image_retrieval/fashionIQ'. You can change this path to wherever you plan on storing the dataset.

📚 Vocabulary file

Open up a python console and run the following lines to download NLTK punkt:

import nltk
nltk.download('punkt')

Then, open up a Jupyter notebook and run jupyter_files/how_to_create_fashion_iq_vocab.ipynb. As with the dataset, the default path is set in data/fashionIQ.py.

We have provided a vocab file in jupyter_files/fashion_iq_vocab.pkl.

📈 Weights & Biases

We use Weights and Biases to log our experiments.

If you already have a Weights & Biases account, head over to configs/FashionIQ_trans_g2_res50_config.json and fill out your wandb_account_name. You can also change the default at options/command_line.py.

If you do not have a Weights & Biases account, you can either create one or change the code and logging functions to your liking.

🏃‍♂️ Run

You can run the code by the following command:

python main.py --config_path=configs/FashionIQ_trans_g2_res50_config.json --experiment_description=test_cosmo_fashionIQDress --device_idx=0,1,2,3

Note that you do not need to assign --device_idx if you have already specified CUDA_VISIBLE_DEVICES=0,1,2,3 in your terminal.

We run on 4 12GB GPUs, and the main gpu gpu:0 uses around 4GB of VRAM.

⚠️ Notes on Evaluation

In our paper, we mentioned that we use a slightly different evaluation method than the original FashionIQ dataset. This was done to match the evaluation method used by VAL.

By default, this code uses the proper evaluation method (as intended by the creators of the dataset). The results for this is shown in our supplementary materials. If you'd like to use the same evaluation method as our main paper (and VAL), head over to data/fashionIQ.py and uncomment the commented section.

📜 Citation

If you use our code, please cite our work:

@InProceedings{CoSMo2021_CVPR,
    author    = {Lee, Seungmin and Kim, Dongwan and Han, Bohyung},
    title     = {CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {802-812}
}

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

Related tags

Overview

CoSMo.pytorch

⚙️ Setup

📦 Install required packages

📂 Dataset

📚 Vocabulary file

📈 Weights & Biases

🏃‍♂️ Run

⚠️ Notes on Evaluation

📜 Citation

Owner

Seung Min Lee

PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

Magisk module to enable hidden features on Android 12 Developer Preview 1.

Use tensorflow to implement a Deep Neural Network for real time lane detection

(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Measure WWjj polarization fraction

Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

A tool to prepare websites grabbed with wget for local viewing.

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Nodule Generation Algorithm Baseline and template code for node21 generation track

Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

This repository contains demos I made with the Transformers library by HuggingFace.