Lexical Substitution Framework

Overview

LexSubGen

Lexical Substitution Framework

This repository contains the code to reproduce the results from the paper:

Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, "Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution", Proceedings of the 28th International Conference on Computational Linguistics, 2020

Installation

Clone LexSubGen repository from github.com.

git clone https://github.com/Samsung/LexSubGen
cd LexSubGen

Setup anaconda environment

  1. Download and install conda
  2. Create new conda environment
    conda create -n lexsubgen python=3.7.4
  3. Activate conda environment
    conda activate lexsubgen
  4. Install requirements
    pip install -r requirements.txt
  5. Download spacy resources and install context2vec and word_forms from github repositories
    ./init.sh

Setup Web Application

If you do not plan to use the Web Application, skip this section and go to the next!

  1. Download and install NodeJS and npm.
  2. Run script for install dependencies and create build files.
bash web_app_setup.sh

Install lexsubgen library

python setup.py install

Results

Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.

Model SemEval COINCO
GAP [email protected] [email protected] [email protected] GAP [email protected] [email protected] [email protected]
OOC 44.65 16.82 12.83 18.36 46.3 19.58 15.03 12.99
C2V 55.82 7.79 5.92 11.03 48.32 8.01 6.63 7.54
C2V+embs 53.39 28.01 21.72 33.52 50.73 29.64 24.0 21.97
ELMo 53.66 11.58 8.55 13.88 49.47 13.58 10.86 11.35
ELMo+embs 54.16 32.0 22.2 31.82 52.22 35.96 26.62 23.8
BERT 54.42 38.39 27.73 39.57 50.5 42.56 32.64 28.73
BERT+embs 53.87 41.64 30.59 43.88 50.85 46.05 35.63 31.67
RoBERTa 56.74 32.25 24.26 36.65 50.82 35.12 27.35 25.41
RoBERTa+embs 58.74 43.19 31.19 44.61 54.6 46.54 36.17 32.1
XLNet 59.12 31.75 22.83 34.95 53.39 38.16 28.58 26.47
XLNet+embs 59.62 49.53 34.9 47.51 55.63 51.5 39.92 35.12

Results reproduction

Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in scripts/lexsub-all-models.sh Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run mlflow ui in LexSubGen directory and then open the web page in a browser.

Also you can use pytest to check the reproducibility. But it may take a long time:

pytest tests/results_reproduction
  • XLNet:

XLNet Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet'

XLNet CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet'

XLNet with embeddings similarity Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs'

XLNet with embeddings similarity CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs'

Word Sense Induction Results

Model SemEval 2013 SemEval 2010
AVG AVG
XLNet 33.4 52.1
XLNet+embs 37.3 54.1

To reproduce these results use 2.3.0 version of transformers and the following command:

bash scripts/wsi.sh

Web application

You could use command line interface to run Web application.

# Run main server
lexsubgen-app run --host HOST 
                  --port PORT 
                  [--model-configs CONFIGS] 
                  [--start-ids START-IDS] 
                  [--start-all] 
                  [--restore-session]

Example:

# Run server and serve models BERT and XLNet. 
# For BERT create server for serving model and substitute generator instantly (load resources in memory).
# For XLNet create only server.
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]' 
                  --start-ids '[0]'

# After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'.
# The content of this file looks like:
# [
#     'my_cool_configs/bert.jsonnet',
#     'my_awesome_configs/xlnet.jsonnet',
# ]
# You can restore it with flag 'restore-session'
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --restore-session
# BERT and XLNet restored now
Arguments:
Argument Default Description
--help Show this help message and exit
--host IP address of running server host
--port 5000 Port for starting the server
--model-configs [] List of file paths to the model configs.
--start-ids [] Zero-based indices of served models for which substitute generators will be created
--start-all False Whether to create substitute generators for all served models
--restore-session False Whether to restore session from previous Web application run

FAQ

  1. How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference: export CUDA_VISIBLE_DEVICES='1' or CUDA_VISIBLE_DEVICES='1' before your command.
  2. How to run tests? - You can use pytest: pytest tests
Owner
Samsung
Samsung Electronics Co.,Ltd.
Samsung
Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

BEVNet Datasets Datasets should be put inside data/. For example, data/semantic_kitti_4class_100x100. Training BEVNet-S Example: cd experiments bash t

(Brian) JoonHo Lee 24 Dec 12, 2022
Jiminy Cricket Environment (NeurIPS 2021)

Jiminy Cricket This is the repository for "What Would Jiminy Cricket Do? Towards Agents That Behave Morally" by Dan Hendrycks*, Mantas Mazeika*, Andy

Dan Hendrycks 15 Aug 29, 2022
Text-Based Ideal Points

Text-Based Ideal Points Source code for the paper: Text-Based Ideal Points by Keyon Vafa, Suresh Naidu, and David Blei (ACL 2020). Update (June 29, 20

Keyon Vafa 37 Oct 09, 2022
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Intelligent Machines Limited 8 May 11, 2022
Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape

IDSIA 36 Nov 15, 2022
Hashformers is a framework for hashtag segmentation with transformers.

Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models

Ruan Chaves 41 Nov 09, 2022
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

Deep Reinforcement Learning Agents This repository contains a collection of reinforcement learning algorithms written in Tensorflow. The ipython noteb

Arthur Juliani 2.2k Jan 01, 2023
A system used to detect whether a person is wearing a medical mask or not.

Mask_Detection_System A system used to detect whether a person is wearing a medical mask or not. To open the program, please follow these steps: Make

Mohamed Emad 0 Nov 17, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 07, 2023
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

Ruidan He 146 Nov 29, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan 32 Dec 23, 2022
Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言 感谢苏神带来的模型,原文地址:https://spaces.ac.cn/archives/8877 如何运行 对应模型EfficientGlobalPoi

powerycy 40 Dec 14, 2022
A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

Hongwei Wang 83 Nov 28, 2022
SeqAttack: a framework for adversarial attacks on token classification models

A framework for adversarial attacks against token classification models

Walter 23 Nov 25, 2022
RGB-D Local Implicit Function for Depth Completion of Transparent Objects

RGB-D Local Implicit Function for Depth Completion of Transparent Objects [Project Page] [Paper] Overview This repository maintains the official imple

NVIDIA Research Projects 43 Dec 12, 2022
pytorch implementation of GPV-Pose

GPV-Pose Pytorch implementation of GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting. (link) UPDATE A new version

40 Dec 01, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 07, 2022
An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters

CNN-Filter-DB An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters Paul Gavrikov, Janis Keuper Paper: htt

Paul Gavrikov 18 Dec 30, 2022