IndoNLI: A Natural Language Inference Dataset for Indonesian

This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natural Language Inference Dataset for Indonesian". The datasets used for our experiments can be found under the data directory:

indonli: human-annotated NLI data, split into train, val, and test (test_lay and test_expert)

diagnostic: subset of examples from test_expert that are annotated with linguistic and logical phenomena
translate_train.tar.gz: MNLI dataset translated to Indonesian (train and dev)
translate_train_small.tar.gz: sampled of translate_train used for the translate_train_small experiment.

The experiment code can be found under experiment directory, please check the related README file.

License

We use premises taken from the Indonesian Wikipedia, news, and Web articles.

Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

For the news genre, we use premise text from Indonesian PUD and GSD treebanks provided by the Universal Dependencies 2.5 (Zeman et al., 2019) and IndoSum (Kurniawan and Louvan, 2018). Indonesian PUD and GSD treebanks are licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA). IndoSum is licensed under Apache License, Version 2.0.

Citation

If you use our corpus in your work, please consider citing our paper:

@inproceedings{indonli,
    title = "IndoNLI: A Natural Language Inference Dataset for Indonesian",
    author = "Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

IndoNLI: A Natural Language Inference Dataset for Indonesian

Related tags

Overview

IndoNLI: A Natural Language Inference Dataset for Indonesian

License

Citation

Owner

Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Phylogeny Partners

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

Background Matting: The World is Your Green Screen

Deep learning with TensorFlow and earth observation data.

Housing Price Prediction

FreeSOLO for unsupervised instance segmentation, CVPR 2022

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

A Japanese Medical Information Extraction Toolkit

Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

A Unified Generative Framework for Various NER Subtasks.

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Background-Click Supervision for Temporal Action Localization

Object-Centric Learning with Slot Attention

Bounding Wasserstein distance with couplings

Medical Image Segmentation using Squeeze-and-Expansion Transformers

U-Net for GBM

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

Deep learning library for solving differential equations and more