GoodNews Everyone! Context driven entity aware captioning for news images

Related tags

Deep LearningGoodNews
Overview

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy!

Model preview:

GoodNews Model!

Huge Thanks goes to New York Times API for providing such a service for FREE!

Another Thanks to @ruotianluo for providing the captioning code.

Dependencies/Requirements:

pytorch==1.0.0
spacy==2.0.11
h5py==2.7.0
bs4==4.5.3
joblib==0.12.2
nltk==3.2.3
tqdm==4.19.5
urllib2==2.7
goose==1.0.25
urlparse
unidecode

Introduction

We took the first steps to move the captioning systems to interpretation (see the paper for more detail). To this end, we have used New York Times API to retrieve the articles, images and captions.

The structure of this repo is as follows:

  1. Getting the data
  2. Cleaning and formating the data
  3. How to train models

Get the data

You have 3 options to get the data.

Images only

If you want to download the images only and directly start working on the same dataset as ours, then download the cleaned version of the dataset without images: article+caption.json and put it to data/ folder and download the img_urls.json and put it in the get_data/get_images_only/ folder.

Then run

python get_images.py --num_thread 16

Then, you will get the images. After that move to Clean and Format Data section.

PS: I have recieved numerous emails regarding some of the images not present/broken in the img_urls.json. Which is why I decided to put the images on the drive to download in the name of open science. Download all images

Images + articles

If you would like the get the raw version of the article and captions to do your own cleaning and processing, no worries! First download the article_urls and go to folder get_data/with_article_urls/ and run

python get_data_with_urls.py --num_thread 16
python combine_dataset.py 

This will get you the raw version of the caption, articles and also the images. After that move to Clean and Format Data section.

I want more!

As you know, New York Times is huge. Their articles starts from 1881 (It is crazy!) until well today. So in case you want to get ALL the data or expand the data to more years, then first step is go to New York Times API and get an API key. All you have to do is just sign up for the API key.

Once you have the key go to folder get_data/with_api/ and run

python retrieve_all_urls.py --api-key XXXX --start_year XXX --end_year XXX 

This is for getting the article urls and then saving in the format of month-year. Once you have the all urls from the API, then you run

python get_data_api.py
python combine_dataset.py

get_data_api.py retrieves the articles, captions and images. combine_dataset.py combines yearly data into one file after removing data points if they have corrupt image, empty articles or empty captions. After that move to Clean and Format Data section.

Small Note

I also provide the links to images and their data splits (train, val, test). Even though I always use random seed to decide the split, just in case If the GODS meddles with the random seed, here is the link to a json where you can find each image and its split: img_splits.json

Clean and Format the Data

Now that we have the data, it is time to clean, preprocess and format the data.

Preprocess

When you reach this part, you must have captioning_dataset.json in your data/ folder.

Captions

This part is for cleaning the captions (tokenizing, removing non-ascii characters, etc.), splitting train, val, and test and creating anonymize captions.

In other words, we change the caption "Alber Einstein taught in Princeton in 1926" to "PERSON_ taught in ORGANIZATION_ in DATE_." Move to preprocess/ folder and run

python clean_captions.py

Resize Images

To resize the images to 256x256:

python resize.py --root XXXX --img_size 256

Articles

Get the article format that is needed for the encoding methods by running: create_article_set.py

python create_article_set.py

Format

Now to create H5 file for captions, images and articles, just need to go to scripts/ folder and run in order

python prepro_labels.py --max_length 31 --word_count_threshold 4
python prepro_images.py

We proposed 3 different article encoding method. You can download each of encoded article methods, articles_full_avg_, articles_full_wavg, articles_full_TBB.

Or you can use the code to obtain them:

python prepro_articles_avg.py
python prepro_articles_wavg.py
python prepro_articles_tbb.py

Train

Finally we are ready to train. Magical words are:

python train.py --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth 

You can check the opt.py for changing a lot of the options such dimension size, different models, hyperparameters, etc.

Evaluate

After you train your models, you can get the score according commonly used metrics: Bleu, Cider, Spice, Rouge, Meteor. Be sure to specify model_path, cnn_model_path, infos_path and sen_embed_path when runing eval.py. eval.py is usually used in training but it is necessary to run it to get the insertion.

Insertion

Last but not least insert.py. After you run eval.py, it will produce you a json file with the ids and their template captions. To fill the correct named entity, you have to run insert.py:

python insert.py --output [XXX] --dump [True/False] --insertion_method ['ctx', 'att', 'rand']

PS: I have been requested to provide model's output, so I thought it would be best to share it with everyone. Model Output In this folder, you have:

test.json: Test set with raw and template version of the caption.

article.json: Article sentences which is needed in the insert.py.

w/o article folder: All the models output on template captions, without articles.

with article folder: Our models output in the paper with sentence attention(sen_att) and image attention(vis_att), provided in the json. Hope this is helpful to more of you.

Conclusion

Thank you and sorry for the bugs!

Implementation of ML models like Decision tree, Naive Bayes, Logistic Regression and many other

ML_Model_implementaion Implementation of ML models like Decision tree, Naive Bayes, Logistic Regression and many other dectree_model: Implementation o

Anshuman Dalai 3 Jan 24, 2022
Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning Reference Abeßer, J. & Müller, M. Towards Audio Domain Adapt

Jakob Abeßer 2 Jul 06, 2022
Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021) This repository is for the following paper: "Investigating Attention

52 Nov 19, 2022
AirLoop: Lifelong Loop Closure Detection

AirLoop This repo contains the source code for paper: Dasong Gao, Chen Wang, Sebastian Scherer. "AirLoop: Lifelong Loop Closure Detection." arXiv prep

Chen Wang 53 Jan 03, 2023
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
A Unified Generative Framework for Various NER Subtasks.

This is the code for ACL-ICJNLP2021 paper A Unified Generative Framework for Various NER Subtasks. Install the package in the requirements.txt, then u

177 Jan 05, 2023
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Facebook Research 712 Dec 19, 2022
This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML)

package tests docs license stats support This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML

National Center for Cognitive Research of ITMO University 482 Dec 26, 2022
Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

Bram Cohen 6 May 07, 2022
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe 367 Dec 24, 2022
Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Project Aquarium Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Cep

Aquarist Labs 73 Jul 21, 2022
GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning, as well as corresponding mitigation strategies.

129 Dec 30, 2022
Activating More Pixels in Image Super-Resolution Transformer

HAT [Paper Link] Activating More Pixels in Image Super-Resolution Transformer Xiangyu Chen, Xintao Wang, Jiantao Zhou and Chao Dong BibTeX @article{ch

XyChen 270 Dec 27, 2022
PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

Yonghye Kwon 7 Jul 12, 2022
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Elias Kassapis 31 Nov 22, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022