Semantic similarity computation with different state-of-the-art metrics

Last update: Jun 22, 2022

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description • Installation • Usage • License

Description

TaxoSS is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN, and HSS.

Requirements

Python 3.6 or later
NLTK
NumPy
Pandas

Installation

TaxoSS can be installed through pip (the Python package manager) in the following way:

pip install taxoss

Usage

Semantic similarity functions

You can compute the semantic similarity in the following way:

from TaxoSS.functions import semantic_similarity
semantic_similarity('brother', 'sister', 'hss')

3.353513521371089

The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:

hss -> HSS (default)
wup -> WUP
lcs -> LC
path_sim -> Shortest Path
resnik -> Resnik
jcn -> Jiang-Conrath
lin -> Lin
seco -> Seco

For the argument ic see the following section.

Information Content

Using a Wikipedia copus for calculating the Information Content (default of the argument ic):

from TaxoSS.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')

6.169410755220327

Calculating Information Conent from a given corpus:

from TaxoSS.calculate_IC import calculate_IC
from TaxoSS.functions import semantic_similarity

calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)

with path_to_save_IC_file a path into the virtual environment TaxoSS package, e.g. venv/lib/python3.6/site-packages/TaxoSS/data/prova_IC.csv.

Benchmark

	HSS (ours)	HSS (ours)	WUP	WUP	LC	LC	Shortest Path	Shortest Path	Resnik	Resnik	Jiang-Conrath	Jiang-Conrath	Lin	Lin	Seco	Seco
	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman
MEN	0.41	0.33	0.36	0.33	0.14	0.05	0.07	0.03	0.05	0.03	-0.05	-0.04	0.05	0.04	-0.01	0.03
MC30	0.74	0.69	0.74	0.73	0.33	0.21	0.22	0.3	0.13	0.03	-0.06	-0.01	0.05	0.01	0.13	-0.09
WSS	0.68	0.65	0.58	0.59	0.36	0.23	0.16	0.1	0.02	-0.03	0.04	0.06	0.03	0.06	-0.01	-0.04
Simlex999	0.4	0.38	0.45	0.43	0.26	0.15	0.2	0.16	-0.04	-0.04	0.12	0.14	0.12	0.14	-0.02	-0.08
MT287	0.46	0.31	0.4	0.28	0.26	0.12	0.11	0.11	0.03	0.04	0.18	0.16	0.22	0.17	0	-0.06
MT771	0.44	0.4	0.43	0.49	0.06	0.02	0.1	0.13	0	-0.01	0	0	0	0	-0.05	-0.03
Time per pair (s)	0.0007	0.0007	0.008	0.008	0.0055	0.0055	0.0064	0.0064	0.5586	0.5586	0.551	0.551	0.5866	0.5866	0.0013	0.0013

Semantic similarity computation with different state-of-the-art metrics

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description

Requirements

Installation

Usage

Semantic similarity functions

Information Content

Benchmark

Owner

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Distributed DataLoader For Pytorch Based On Ray

Norm-based Analysis of Transformer

Machine learning framework for both deep learning and traditional algorithms

Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

An index of algorithms for learning causality with data

学习 python3 以来写的一些垃圾玩具……

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

COCO Style Dataset Generator GUI

PyTorch deep learning projects made easy.

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Baseline powergrid model for NY

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

Projects of Andfun Yangon

[ICLR'21] Counterfactual Generative Networks

A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

Supervised Contrastive Learning for Product Matching

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Applying curriculum to meta-learning for few shot classification