Amazon Multilingual Counterfactual Dataset (AMCD)

Last update: Sep 20, 2022

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

This repository contains a dataset described in the paper:

I Wish I Would Have Loved This One, But I Didn’t – A Multilingual Dataset for Counterfactual Detection in Product Reviews. James O’Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, Danushka Bollegala. EMNLP'21. arxiv version

The dataset contains sentences from Amazon customer reviews (sampled from Amazon product review dataset) annotated for counterfactual detection (CFD) binary classification. Counterfactual statements describe events that did not or cannot take place. Counterfactual statements may be identified as statements of the form – If p was true, then q would be true (i.e. assertions whose antecedent (p) and consequent (q) are known or assumed to be false).

The key features of this dataset are:

The dataset is multilingual and contains sentences in English, German, and Japanese.
The labeling was done by professional linguists and high quality was ensured.
The dataset is supplemented with the annotation guidelines and definitions, which were worked out by professional linguists. We also provide the clue word lists, which are typical for counterfactual sentences and were used for initial data filtering. The clue word lists were also compiled by professional linguists.

Please see paper for the data statistics, detailed description of data collection and annotation.

For the dataset format please see README.txt.

Cite

If you use this dataset in your research, please cite the paper.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

Amazon Multilingual Counterfactual Dataset (AMCD)

Related tags

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

Cite

License Summary

Owner

2021海华AI挑战赛·中文阅读理解·技术组·第三名

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ChatBotProyect - This is an unfinished project about a simple chatbot.

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

Convolutional Neural Networks for Sentence Classification

Application for shadowing Chinese.

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

DeLighT: Very Deep and Light-Weight Transformers

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Python library for parsing resumes using natural language processing and machine learning

Sequence modeling benchmarks and temporal convolutional networks

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Faster, modernized fork of the language identification tool langid.py

🏆 • 5050 most frequent words in 109 languages