A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Related tags

Text Data & NLPKGEval
Overview

KGEval

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

The framework and experimental results are described in Ben Rim et al. 2021 (Outstanding Paper Award, AKBC 2021).

Instructions

Create a virtual environment

virtualenv -p python3.6 eval_env
source eval_env/bin/activate
pip install -r requirements.txt

Download data

In the main folder, run:

source data/download.sh

Download model

If you want to test the framework immediately, you can download pre-trained Pykeen models by running:

source download_models.sh

Generate behavioral tests

Symmetry Tests

Can choose --dataset FB15K237, WN18RR, YAGO310

python tests/run.py --dataset FB15K237 --mode generate --capability symmetry

This should result into the following output, and the files for each test set will be added under behavioral_tests\dataset\symmetry:

2021-10-03 23:37:35,060 - [INFO] - Preparing test sets for the dataset FB15K237
2021-10-03 23:37:37,621 - [INFO] - ########################## <----TRAIN---> ############################
2021-10-03 23:37:37,621 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,621 - [INFO] - 272115 triples remaining in train set
2021-10-03 23:37:37,621 - [INFO] - 6778 symmetric triples found in train set
2021-10-03 23:37:37,786 - [INFO] - ########################## <----TEST---> ############################
2021-10-03 23:37:37,786 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,786 - [INFO] - 20466 triples remaining in test set
2021-10-03 23:37:37,786 - [INFO] - 113 symmetric triples found in test set
2021-10-03 23:37:37,806 - [INFO] - ########################## <----VALID---> ############################
2021-10-03 23:37:37,806 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,806 - [INFO] - 17535 triples remaining in valid set
2021-10-03 23:37:37,806 - [INFO] - 113 symmetric triples found in valid set
2021-10-03 23:37:39,106 - [INFO] - #################### <---TEST SET 1: MEMORIZATION ---> ##########################
2021-10-03 23:37:39,106 - [INFO] - There are 5470 entries in the memorization set (occur in both directions)
2021-10-03 23:37:39,106 - [INFO] - #################### <---TEST SET 2: ONE DIRECTION SEEN ---> ##########################
2021-10-03 23:37:39,106 - [INFO] - There are 1308 entries not shown in both directions (to be reversed for testing)
2021-10-03 23:37:39,836 - [INFO] - #################### <--- SYMMETRIC RELATIONS ---> ##########################
2021-10-03 23:37:39,836 - [INFO] - TRAIN SET contains 6778 symmetric entries
2021-10-03 23:37:39,836 - [INFO] - TEST SET contains  113 symmetric entries with 113 not in training
2021-10-03 23:37:39,836 - [INFO] - VALID SET contains 113 symmetric entries with 113 not in training
2021-10-03 23:37:39,839 - [INFO] - #################### <---TEST SET 3: UNSEEN INSTANCES ---> ##########################
2021-10-03 23:37:39,840 - [INFO] - There are 226 entries that are not seen in any direction in training
2021-10-03 23:37:40,267 - [INFO] - #################### <---TEST SET 4: ASYMMETRY ---> ##########################
2021-10-03 23:37:40,267 - [INFO] - There are 3000 asymmetric entries in test set added to test 4

Hierarchy Tests

Only available for FB15K237 dataset

python tests/run.py --dataset FB15K237 --mode generate --capability hierarchy

The output should be and will be available under behavioral_tests/dataset/hierarchy/, the naming of the files corresponds to triples where the tail belongs to a specified level. For example, 1.txt contains triples where the tail has a type of level 1 in the entity type hierarchy :

2021-10-04 01:38:13,517 - [INFO] - Results of Hierarchy Behavioral Tests for FB15K237
2021-10-04 01:38:20,367 - [INFO] - <--------------- Entity Hiararchy statistics ----------------->
2021-10-04 01:38:20,568 - [INFO] - Level 0 contains 1 types and 3415 triples
2021-10-04 01:38:20,887 - [INFO] - Level 1 contains 66 types and 2006 triples
2021-10-04 01:38:20,900 - [INFO] - Level 2 contains 136 types and 4273 triples
2021-10-04 01:38:20,913 - [INFO] - Level 3 contains 213 types and 3560 triples
2021-10-04 01:38:20,923 - [INFO] - Level 4 contains 262 types and 3369 triples

Run Tests (pykeen models)

Symmetry behavioral tests on distmult or rotate:

python tests/run.py --dataset FB15K237 --mode test --model_name rotate

The output will be printed as shown below, and will also be available in the results folder under dataset/symmetry:

2021-10-04 14:00:57,100 - [INFO] - Starting test1 with rotate model
2021-10-04 14:03:23,249 - [INFO] - On test1, MR: 1.2407678244972578, MRR: 0.9400152688974949, [email protected]: 0.9014624953269958, [email protected]: 0.988482654094696, [email protected]: 0.9965264797210693
2021-10-04 14:03:23,249 - [INFO] - Starting test2 with rotate model
2021-10-04 14:04:15,614 - [INFO] - On test2, MR: 23.446483180428135, MRR: 0.4409348919640765, [email protected]: 0.30351680517196655, [email protected]: 0.5894495248794556, [email protected]: 0.7025994062423706
2021-10-04 14:04:15,614 - [INFO] - Starting test3 with rotate model
2021-10-04 14:04:25,364 - [INFO] - On test3, MR: 1018.9469026548672, MRR: 0.04786047740344238, [email protected]: 0.008849557489156723, [email protected]: 0.06194690242409706, [email protected]: 0.12389380484819412
2021-10-04 14:04:25,365 - [INFO] - Starting test4 with rotate model
2021-10-04 14:05:38,900 - [INFO] - On test4, MR: 4901.459, MRR: 0.07606098649786266, [email protected]: 0.9496666789054871, [email protected]: 0.893666684627533, [email protected]: 0.8823333382606506

Hierarchy behavioral tests on distmult or rotate:

   python tests/run.py --dataset FB15K237 --mode test --capability hierarchy --model_name rotate

Run Tests on other models and other frameworks

(To be added)

Owner
NEC Laboratories Europe
Research software developed at NEC Laboratories Europe
NEC Laboratories Europe
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)

Time-aware Large Kernel (TaLK) Convolutions (Lioutas et al., 2020) This repository contains the source code, pre-trained models, as well as instructio

Vasileios Lioutas 28 Dec 07, 2022
A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

SafeGraph 251 Jan 01, 2023
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The

Simran Farrukh 0 Mar 28, 2022
This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022
A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

poseWrangler Overview PoseWrangler is a simple UI to create and edit pose-driven relationships in Maya using the MayaUE4RBF plugin. This plugin is dis

Christopher Evans 105 Dec 18, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

Eric 20 Nov 18, 2022
Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

Persian Lexicon This repo uses Uppsala Persian Corpus (UPC) to construct a lexic

Saman Vaisipour 7 Apr 01, 2022
Unsupervised Language Modeling at scale for robust sentiment classification

** DEPRECATED ** This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

NVIDIA Corporation 1k Nov 17, 2022
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
Lattice methods in TensorFlow

TensorFlow Lattice TensorFlow Lattice is a library that implements constrained and interpretable lattice based models. It is an implementation of Mono

504 Dec 20, 2022
Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

Junbum Lee 12 Oct 26, 2022
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
Milaan Parmar / Милан пармар / _米兰 帕尔马 170 Dec 13, 2022
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 A repository part of the MarIA project. Corpora 📃 Corpora Number of documents Number of tokens Size (GB) BNE 201,080,084

Plan de Tecnologías del Lenguaje - Gobierno de España 203 Dec 20, 2022
Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

Open Knowledge Foundation 3 Dec 02, 2022