Explainable Zero-Shot Topic Extraction

Related tags

Deep LearningZeSTE
Overview

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph

This repository contains the code for reproducing the results reported in the paper "Explainable Zero-Shot Topic Extraction with Common-Sense Knowledge Graph" (pdf) at the LDK 2021 Conference.

A user-friendly demo is available at: http://zeste.tools.eurecom.fr/

ZeSTE

Based on ConceptNet's common sense knowledge graph and embeddings, ZeSTE generates explainable predictions for a document topical category (e.g. politics, sports, video_games ..) without reliance on training data. The following is a high-level illustration of the approach:

API

ZeSTE can also be accessed via a RESTful API for easy deployment and use. For further information, please refer to the documentation: https://zeste.tools.eurecom.fr/doc

Dependencies

Before running any code in this repo, please install the following dependencies:

  • numpy
  • pandas
  • matplotlib
  • nltk
  • sklearn
  • tqdm
  • gensim

Code Overview

This repo is organized as follows:

  • generate_cache.py: this script processes the raw ConceptNet dump to produce cached files for each node in ConceptNet to accelerate the label neighborhood generation. It also transforms ConceptNet Numberbatch text file into a Gensim word embedding that we pickle for quick loading.
  • zeste.py: this is the main script for evaluation. It takes as argument the dataset to process as well as model configuration parameters such as neighborhood depth (see below). The results (classification report, confusion matrix, and classification metrics) are persisted into text files.
  • util.py: contains the functions that are used in zeste.py
  • label_mappings: contains the tab-separated mappings for the studied datasets.

Reproducing Results

1. Downloads

The two following files need to be downloaded to bypass the use of ConceptNet's web API: the dump of ConceptNet triplets, and the ConceptNet Numberbatch pre-computed word embeddings. You can download them from ConceptNet's and Numberbatch's repos, respectively.

# wget https://s3.amazonaws.com/conceptnet/downloads/2019/edges/conceptnet-assertions-5.7.0.csv.gz
# wget https://conceptnet.s3.amazonaws.com/downloads/2019/numberbatch/numberbatch-19.08.txt.gz
# gzip -d conceptnet-assertions-5.7.0.csv.gz
# gzip -d numberbatch-19.08.txt.gz

2. generate_cache.py

This script takes as input the two just-downloaded files and the cache path to where precomputed 1-hop label neighborhoods will be saved. This can take up to 7.2G of storage space.

usage: generate_cache.py [-h] [-cnp CONCEPTNET_ASSERTIONS_PATH] [-nbp CONCEPTNET_NUMBERBATCH_PATH] [-zcp ZESTE_CACHE_PATH]

Zero-Shot Topic Extraction

optional arguments:
  -h, --help            show this help message and exit
  -cnp CONCEPTNET_ASSERTIONS_PATH, --conceptnet_assertions_path CONCEPTNET_ASSERTIONS_PATH
                        Path to CSV file containing ConceptNet assertions dump
  -nbp CONCEPTNET_NUMBERBATCH_PATH, --conceptnet_numberbatch_path CONCEPTNET_NUMBERBATCH_PATH
                        Path to W2V file for ConceptNet Numberbatch
  -zcp ZESTE_CACHE_PATH, --zeste_cache_path ZESTE_CACHE_PATH
                        Path to the repository where the generated files will be saved

3. zeste.py

This script uses the precomputed 1-hop label neighborhoods to recursively generate label neighborhoods of any given depth (-d). It takes also as parameters the path to the dataset CSV file (which should have two columns: text and label). The rest of the arguments are for model experimentation.

usage: zeste.py [-h] [-cp CACHE_PATH] [-pp PREFETCH_PATH] [-nb NUMBERBATCH_PATH] [-dp DATASET_PATH] [-lm LABELS_MAPPING] [-rp RESULTS_PATH]
                [-d DEPTH] [-f FILTER] [-s {simple,compound,depth,harmonized}] [-ar ALLOWED_RELS]

Zero-Shot Topic Extraction

optional arguments:
  -h, --help            show this help message and exit
  -cp CACHE_PATH, --cache_path CACHE_PATH
                        Path to where the 1-hop word neighborhoods are cached
  -pp PREFETCH_PATH, --prefetch_path PREFETCH_PATH
                        Path to where the precomputed n-hop neighborhoods are cached
  -nb NUMBERBATCH_PATH, --numberbatch_path NUMBERBATCH_PATH
                        Path to the pickled Numberbatch
  -dp DATASET_PATH, --dataset_path DATASET_PATH
                        Path to the dataset to process
  -lm LABELS_MAPPING, --labels_mapping LABELS_MAPPING
                        Path to the mapping between the dataset labels and ZeSTE labels (multiword labels are comma-separated)
  -rp RESULTS_PATH, --results_path RESULTS_PATH
                        Path to the directory where to store the results
  -d DEPTH, --depth DEPTH
                        How many hops to generate the neighborhoods
  -f FILTER, --filter FILTER
                        Filtering method: top[N], top[P]%, thresh[T], all
  -s {simple,compound,depth,harmonized}, --similarity {simple,compound,depth,harmonized}
  -ar ALLOWED_RELS, --allowed_rels ALLOWED_RELS
                        Which relationships to use (comma-separated or all)

Cite this work

@InProceedings{harrando_et_al_zeste_2021,
  author ={Harrando, Ismail and Troncy, Rapha\"{e}l},
  title ={{Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph}},
  booktitle ={3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages ={17:1--17:15},
  year ={2021},
  volume ={93},
  publisher ={Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  URL ={https://drops.dagstuhl.de/opus/volltexte/2021/14553},
  URN ={urn:nbn:de:0030-drops-145532},
  doi ={10.4230/OASIcs.LDK.2021.17},
}
Owner
D2K Lab
Data to Knowledge Virtual Lab (LINKS Foundation - EURECOM)
D2K Lab
In this project, we create and implement a deep learning library from scratch.

ARA In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The

22 Aug 23, 2022
Largest list of models for Core ML (for iOS 11+)

Since iOS 11, Apple released Core ML framework to help developers integrate machine learning models into applications. The official documentation We'v

Kedan Li 5.6k Jan 08, 2023
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
A simple pytorch pipeline for semantic segmentation.

SegmentationPipeline -- Pytorch A simple pytorch pipeline for semantic segmentation. Requirements : torch=1.9.0 tqdm albumentations=1.0.3 opencv-pyt

petite7 4 Feb 22, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Tom-R.T.Kvalvaag 2 Dec 17, 2021
CT-Net: Channel Tensorization Network for Video Classification

[ICLR2021] CT-Net: Channel Tensorization Network for Video Classification @inproceedings{ li2021ctnet, title={{\{}CT{\}}-Net: Channel Tensorization Ne

33 Nov 15, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

Hugging Face 77.4k Jan 05, 2023
GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

Alibaba Cloud 5 Nov 14, 2022
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 04, 2022
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
Official implementation of "A Shared Representation for Photorealistic Driving Simulators" in PyTorch.

A Shared Representation for Photorealistic Driving Simulators The official code for the paper: "A Shared Representation for Photorealistic Driving Sim

VITA lab at EPFL 7 Oct 13, 2022
Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Learning Tracking Representations via Dual-Branch Fully Transformer Networks DualTFR ⭐ We achieves the runner-ups for both VOT2021ST (short-term) and

phiphi 19 May 04, 2022
GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs GraphLily is the first FPGA overlay for graph processing. GraphLily supports a rich se

Cornell Zhang Research Group 39 Dec 13, 2022
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110.06922). Our implementations are built on top of MMdetection3D.

Wang, Yue 539 Jan 07, 2023
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
use machine learning to recognize gesture on raspberrypi

Raspberrypi_Gesture-Recognition use machine learning to recognize gesture on raspberrypi 說明 利用 tensorflow lite 訓練手部辨識模型 分辨 "剪刀"、"石頭"、"布" 之手勢 再將訓練模型匯入

1 Dec 10, 2021