Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

Last update: Dec 04, 2022

Related tags

Deep Learning SCAPT-ABSA

Overview

SCAPT-ABSA

Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

Overview

In this repository, we provide code for Superived ContrAstive Pre-Training (SCAPT) and aspect-aware fine-tuning, retrieved sentiment corpora from YELP/Amazon reviews, and SemEval2014 Restaurant/Laptop with addtional implicit_sentiment labeling.

SCAPT aims to tackle implicit sentiments expression in aspect-based sentiment analysis(ABSA). In our work, we define implicit sentiment as sentiment expressions that contain no polarity markers but still convey clear human-aware sentiment polarity.

Here are examples for explicit and implicit sentiment in ABSA:

SCAPT

SCAPT gives an aligned representation of sentiment expressions with the same sentiment label, which consists of three objectives:

Supervised Contrastive Learning (SCL)
Review Reconstruction (RR)
Masked Aspect Prediction (MAP)

Aspect-aware Fine-tuning

Sentiment representation and aspect-based representation are taken into account for sentiment prediction in aspect-aware fine-tuning.

Requirement

cuda 11.0
python 3.7.9
- lxml 4.6.2
- numpy 1.19.2
- pytorch 1.8.0
- pyyaml 5.3.1
- tqdm 4.55.0
- transformers 4.2.2

Data Preparation & Preprocessing

For Pre-training

Retrieved sentiment corpora contain millions-level reviews, we provide download links for original corpora and preprocessed data. Download if you want to do pre-training and further use them:

File	Google Drive Link	Baidu Wangpan Link	Baidu Wangpan Code
scapt_yelp_json.zip	link	link	q7fs
scapt_amazon_json.zip	link	link	i1da
scapt_yelp_pkl.zip	link	link	j9ce
scapt_amazon_pkl.zip	link	link	3b8t

These pickle files can also be generated from json files by the preprocessing method:

bash preprocess.py --pretrain

For Fine-tuning

We have already combined the opinion term labeling to the original SemEval2014 datasets. For example:

    <sentence id="1634">
        <text>The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.</text>
        <aspectTerms>
            <aspectTerm term="food" polarity="positive" from="4" to="8" implicit_sentiment="False" opinion_words="exceptional"/>
            <aspectTerm term="kitchen" polarity="positive" from="55" to="62" implicit_sentiment="False" opinion_words="capable"/>
            <aspectTerm term="menu" polarity="neutral" from="141" to="145" implicit_sentiment="True"/>
        </aspectTerms>
        <aspectCategories>
            <aspectCategory category="food" polarity="positive"/>
        </aspectCategories>
    </sentence>

implicit_sentiment indicates whether it is an implicit sentiment expression and yield opinion_words if not implicit. The opinion_words lebaling is credited to TOWE.

Both original and extended fine-tuning data and preprocessed dumps are uploaded to this repository.

Consequently, the structure of your data directory should be:

├── Amazon
│   ├── amazon_laptops.json
│   └── amazon_laptops_preprocess_pretrain.pkl
├── laptops
│   ├── Laptops_Test_Gold_Implicit_Labeled_preprocess_finetune.pkl
│   ├── Laptops_Test_Gold_Implicit_Labeled.xml
│   ├── Laptops_Test_Gold.xml
│   ├── Laptops_Train_v2_Implicit_Labeled_preprocess_finetune.pkl
│   ├── Laptops_Train_v2_Implicit_Labeled.xml
│   └── Laptops_Train_v2.xml
├── MAMS
│   ├── test_preprocess_finetune.pkl
│   ├── test.xml
│   ├── train_preprocess_finetune.pkl
│   ├── train.xml
│   ├── val_preprocess_finetune.pkl
│   └── val.xml
├── restaurants
│   ├── Restaurants_Test_Gold_Implicit_Labeled_preprocess_finetune.pkl
│   ├── Restaurants_Test_Gold_Implicit_Labeled.xml
│   ├── Restaurants_Test_Gold.xml
│   ├── Restaurants_Train_v2_Implicit_Labeled_preprocess_finetune.pkl
│   ├── Restaurants_Train_v2_Implicit_Labeled.xml
│   └── Restaurants_Train_v2.xml
└── YELP
    ├── yelp_restaurants.json
    └── yelp_restaurants_preprocess_pretrain.pkl

Pre-training

The pre-training is conducted on multiple GPUs.

Pre-training [TransEnc|BERT] on [YELP|Amazon]:

python -m torch.distributed.launch --nproc_per_node=${THE_CARD_NUM_YOU_HAVE} multi_card_train.py --config config/[yelp|amazon]_[TransEnc|BERT]_pretrain.yml

Model checkpoints are saved in results.

Fine-tuning

Directly train [TransEnc|BERT] on [Restaurants|Laptops|MAMS] As [TransEncAsp|BERTAsp]:

python train.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml

Fine-tune the pre-trained [TransEnc|BERT] on [Restaurants|Laptops|MAMS] As [TransEncAsp+SCAPT|BERTAsp+SCAPT]:

python train.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml --checkpoint PATH/TO/MODEL_CHECKPOINT

Model checkpoints are saved in results.

Evaluation

Evaluate [TransEnc|BERT]-based model on [Restaurants|Laptops|MAMS] dataset:

python evaluate.py --config config/[restaurants|laptops|mams]_[TransEnc|BERT]_finetune.yml --checkpoint PATH/TO/MODEL_CHECKPOINT

Our model parameters:

Model	Dataset	File	Google Drive Link	Baidu Wangpan Link	Baidu Wangpan Code
TransEncAsp+SCAPT	SemEval2014 Restaurant	TransEnc_restaurants.zip	link	link	5e5c
TransEncAsp+SCAPT	SemEval2014 Laptop	TransEnc_laptops.zip	link	link	8amq
TransEncAsp+SCAPT	MAMS	TransEnc_MAMS.zip	link	link	bf2x
BERTAsp+SCAPT	SemEval2014 Restaurant	BERT_restaurants.zip	link	link	1w2e
BERTAsp+SCAPT	SemEval2014 Laptop	BERT_laptops.zip	link	link	zhte
BERTAsp+SCAPT	MAMS	BERT_MAMS.zip	link	link	1iva

Citation

If you found this repository useful, please cite our paper:

@inproceedings{li-etal-2021-learning-implicit,
    title = "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training",
    author = "Li, Zhengyan  and
      Zou, Yicheng  and
      Zhang, Chong  and
      Zhang, Qi  and
      Wei, Zhongyu",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.22",
    pages = "246--256",
    abstract = "Aspect-based sentiment analysis aims to identify the sentiment polarity of a specific aspect in product reviews. We notice that about 30{\%} of reviews do not contain obvious opinion words, but still convey clear human-aware sentiment orientation, which is known as implicit sentiment. However, recent neural network-based approaches paid little attention to implicit sentiment entailed in the reviews. To overcome this issue, we adopt Supervised Contrastive Pre-training on large-scale sentiment-annotated corpora retrieved from in-domain language resources. By aligning the representation of implicit sentiment expressions to those with the same sentiment label, the pre-training process leads to better capture of both implicit and explicit sentiment orientation towards aspects in reviews. Experimental results show that our method achieves state-of-the-art performance on SemEval2014 benchmarks, and comprehensive analysis validates its effectiveness on learning implicit sentiment.",
}

Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

Related tags

Overview

SCAPT-ABSA

Overview

SCAPT

Aspect-aware Fine-tuning

Requirement

Data Preparation & Preprocessing

For Pre-training

For Fine-tuning

Pre-training

Fine-tuning

Evaluation

Citation

Owner

Zhengyan Li

Run PowerShell command without invoking powershell.exe

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

Self-supervised learning (SSL) is a method of machine learning

Aligning Latent and Image Spaces to Connect the Unconnectable

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

DeLighT: Very Deep and Light-Weight Transformers

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Histocartography is a framework bringing together AI and Digital Pathology

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

Semantic Edge Detection with Diverse Deep Supervision

AgML is a comprehensive library for agricultural machine learning

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Two-stage CenterNet

[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

Optimized primitives for collective multi-GPU communication

Tilted Empirical Risk Minimization (ICLR '21)

PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning