GoodNews Everyone! Context driven entity aware captioning for news images

Related tags

Deep LearningGoodNews
Overview

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy!

Model preview:

GoodNews Model!

Huge Thanks goes to New York Times API for providing such a service for FREE!

Another Thanks to @ruotianluo for providing the captioning code.

Dependencies/Requirements:

pytorch==1.0.0
spacy==2.0.11
h5py==2.7.0
bs4==4.5.3
joblib==0.12.2
nltk==3.2.3
tqdm==4.19.5
urllib2==2.7
goose==1.0.25
urlparse
unidecode

Introduction

We took the first steps to move the captioning systems to interpretation (see the paper for more detail). To this end, we have used New York Times API to retrieve the articles, images and captions.

The structure of this repo is as follows:

  1. Getting the data
  2. Cleaning and formating the data
  3. How to train models

Get the data

You have 3 options to get the data.

Images only

If you want to download the images only and directly start working on the same dataset as ours, then download the cleaned version of the dataset without images: article+caption.json and put it to data/ folder and download the img_urls.json and put it in the get_data/get_images_only/ folder.

Then run

python get_images.py --num_thread 16

Then, you will get the images. After that move to Clean and Format Data section.

PS: I have recieved numerous emails regarding some of the images not present/broken in the img_urls.json. Which is why I decided to put the images on the drive to download in the name of open science. Download all images

Images + articles

If you would like the get the raw version of the article and captions to do your own cleaning and processing, no worries! First download the article_urls and go to folder get_data/with_article_urls/ and run

python get_data_with_urls.py --num_thread 16
python combine_dataset.py 

This will get you the raw version of the caption, articles and also the images. After that move to Clean and Format Data section.

I want more!

As you know, New York Times is huge. Their articles starts from 1881 (It is crazy!) until well today. So in case you want to get ALL the data or expand the data to more years, then first step is go to New York Times API and get an API key. All you have to do is just sign up for the API key.

Once you have the key go to folder get_data/with_api/ and run

python retrieve_all_urls.py --api-key XXXX --start_year XXX --end_year XXX 

This is for getting the article urls and then saving in the format of month-year. Once you have the all urls from the API, then you run

python get_data_api.py
python combine_dataset.py

get_data_api.py retrieves the articles, captions and images. combine_dataset.py combines yearly data into one file after removing data points if they have corrupt image, empty articles or empty captions. After that move to Clean and Format Data section.

Small Note

I also provide the links to images and their data splits (train, val, test). Even though I always use random seed to decide the split, just in case If the GODS meddles with the random seed, here is the link to a json where you can find each image and its split: img_splits.json

Clean and Format the Data

Now that we have the data, it is time to clean, preprocess and format the data.

Preprocess

When you reach this part, you must have captioning_dataset.json in your data/ folder.

Captions

This part is for cleaning the captions (tokenizing, removing non-ascii characters, etc.), splitting train, val, and test and creating anonymize captions.

In other words, we change the caption "Alber Einstein taught in Princeton in 1926" to "PERSON_ taught in ORGANIZATION_ in DATE_." Move to preprocess/ folder and run

python clean_captions.py

Resize Images

To resize the images to 256x256:

python resize.py --root XXXX --img_size 256

Articles

Get the article format that is needed for the encoding methods by running: create_article_set.py

python create_article_set.py

Format

Now to create H5 file for captions, images and articles, just need to go to scripts/ folder and run in order

python prepro_labels.py --max_length 31 --word_count_threshold 4
python prepro_images.py

We proposed 3 different article encoding method. You can download each of encoded article methods, articles_full_avg_, articles_full_wavg, articles_full_TBB.

Or you can use the code to obtain them:

python prepro_articles_avg.py
python prepro_articles_wavg.py
python prepro_articles_tbb.py

Train

Finally we are ready to train. Magical words are:

python train.py --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth 

You can check the opt.py for changing a lot of the options such dimension size, different models, hyperparameters, etc.

Evaluate

After you train your models, you can get the score according commonly used metrics: Bleu, Cider, Spice, Rouge, Meteor. Be sure to specify model_path, cnn_model_path, infos_path and sen_embed_path when runing eval.py. eval.py is usually used in training but it is necessary to run it to get the insertion.

Insertion

Last but not least insert.py. After you run eval.py, it will produce you a json file with the ids and their template captions. To fill the correct named entity, you have to run insert.py:

python insert.py --output [XXX] --dump [True/False] --insertion_method ['ctx', 'att', 'rand']

PS: I have been requested to provide model's output, so I thought it would be best to share it with everyone. Model Output In this folder, you have:

test.json: Test set with raw and template version of the caption.

article.json: Article sentences which is needed in the insert.py.

w/o article folder: All the models output on template captions, without articles.

with article folder: Our models output in the paper with sentence attention(sen_att) and image attention(vis_att), provided in the json. Hope this is helpful to more of you.

Conclusion

Thank you and sorry for the bugs!

PyTorch and Tensorflow functional model definitions

functional-zoo Model definitions and pretrained weights for PyTorch and Tensorflow PyTorch, unlike lua torch, has autograd in it's core, so using modu

Sergey Zagoruyko 590 Dec 22, 2022
Deploy pytorch classification model using Flask and Streamlit

Deploy pytorch classification model using Flask and Streamlit

Ben Seo 1 Nov 17, 2021
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Disentangled Lifespan Face Synthesis

Disentangled Lifespan Face Synthesis Project Page | Paper Demo on Colab Preparation Please follow this github to prepare the environments and dataset.

何森 50 Sep 20, 2022
A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

faceswap-GAN Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture. Updates Date Update 2018-08-2

3.2k Dec 30, 2022
Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Sphere Confidence Face (SCF) This repository contains the PyTorch implementation of Sphere Confidence Face (SCF) proposed in the CVPR2021 paper: Shen

Maths 70 Dec 09, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 08, 2023
unet-family: Ultimate version

unet-family: Ultimate version 基于之前my-unet代码,我整理出来了这一份终极版本unet-family,方便其他人阅读。 相比于之前的my-unet代码,代码分类更加规范,有条理 对于clone下来的代码不需要修改各种复杂繁琐的路径问题,直接就可以运行。 并且代码有

2 Sep 19, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

MIMIC Code Repository The MIMIC Code Repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC

MIT Laboratory for Computational Physiology 1.8k Dec 26, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

DendroMap DendroMap is an interactive tool to explore large-scale image datasets used for machine learning. A deep understanding of your data can be v

DIV Lab 33 Dec 30, 2022
Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.

Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size. The hub data layout enables rapid transformations and streaming of data while training m

Activeloop 5.1k Jan 08, 2023
PyTorch implementation of InstaGAN: Instance-aware Image-to-Image Translation

InstaGAN: Instance-aware Image-to-Image Translation Warning: This repo contains a model which has potential ethical concerns. Remark that the task of

Sangwoo Mo 827 Dec 29, 2022
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

3ds Ghidra Scripts These are ghidra scripts to help with 3ds reverse engineering

Zak 7 May 23, 2022
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

AirPose AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation Check the teaser video This repository contains the code of A

Robot Perception Group 41 Dec 05, 2022
PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS. With Live, you can build a working mobile app ML demo in minutes.

559 Jan 01, 2023
Artifacts for paper "MMO: Meta Multi-Objectivization for Software Configuration Tuning"

MMO: Meta Multi-Objectivization for Software Configuration Tuning This repository contains the data and code for the following paper that is currently

0 Nov 17, 2021
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022