How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Related tags

Deep Learningcodemix
Overview

This repo contains codes for the following paper:

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.
Aditya Shah, Chandresh Kumar Maurya, In Proceedings of the 18th International Conference on Natural Language Processing - (ACL 2021).

The presentation slides are available here

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)
Pandas, Numpy, Pickle
Fasttext

Download the fasttext embed file:

The fasttext embedding file can be obtained here

Dataset

We release the benchmark sarcasm dataset for Hinglish language to facilitate further research on code-mix NLP.

We create a dataset using TweetScraper built on top of scrapy to extract code-mix hindi-english tweets. We pass search tags like #sarcasm, #humor, #bollywood, #cricket, etc., combined with most commonly used code-mix Hindi words as query. All the tweets with hashtags like #sarcasm, #sarcastic, #irony, #humor etc. are treated as positive. Non sarcastic tweets are extracted using general hashtags like #politics, #food, #movie, etc. The balanced dataset comprises of 166K tweets.

Finally, we preprocess and clean the data by removing urls, hashtags, mentions, and punctuation in the data. The respective files can be found here as train.csv, val.csv, and test.csv

Arguments:

--epochs:  number of total epochs to run, default=10

--batch-size: train batchsize, default=2

--lr: learning rate for the model, default=5.16e-05

--hidden_size_lstm: hidden size of lstm, default=1024

--hidden_size_linear: hidden size of linear layer, default=128

--seq_len: sequence lenght of input text, default=56

--clip: gradient clipping, default=0.218

--dropout: dropout value, default=0.198

--num_layers: number of lstm layers, default=1

--lstm_bidirectional: bidirectional lstm, default=False

--fasttext_embed_file: path to fasttext embedding file, default='new_hing_emb'

--train_dir: path to train file, default='train.csv'

--valid_dir: path to validation file, default='valid.csv'

--test_dir: path to test file, default='test.csv'

--checkpoint_dir: path to the saved, default='selfnet.pt'

--test: testing the model, default=False

Train

python main.py

Test

python main.py --test True

Code for classifying international patents based on the text of their titles/abstracts

Patent Classification Goal: To train a machine learning classifier that can automatically classify international patents downloaded from the WIPO webs

Prashanth Rao 1 Nov 08, 2022
Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization This repository contains the code for the BBI optimizer, introduced in the p

G. Bruno De Luca 5 Sep 06, 2022
This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

Fisher Information Loss This repository contains code that can be used to reproduce the experimental results presented in the paper: Awni Hannun, Chua

Facebook Research 43 Dec 30, 2022
Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

IMAGINE: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration This repo contains the code base of the paper Language as a Cog

Flowers Team 26 Dec 22, 2022
This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking

SimpleTrack This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking. We are still working on writing t

TuSimple 189 Dec 26, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

631 Jan 04, 2023
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Ch

Yongming Rao 414 Jan 01, 2023
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
Source code of article "Towards Toxic and Narcotic Medication Detection with Rotated Object Detector"

Towards Toxic and Narcotic Medication Detection with Rotated Object Detector Introduction This is the source code of article: Towards Toxic and Narcot

Woody. Wang 3 Oct 29, 2022
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

Conchylicultor 2.9k Dec 28, 2022
A PyTorch Implementation of Neural IMage Assessment

NIMA: Neural IMage Assessment This is a PyTorch implementation of the paper NIMA: Neural IMage Assessment (accepted at IEEE Transactions on Image Proc

yunxiaos 418 Dec 29, 2022
Generative Models as a Data Source for Multiview Representation Learning

GenRep Project Page | Paper Generative Models as a Data Source for Multiview Representation Learning Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip

Ali 81 Dec 03, 2022
Face uncertainty quantification or estimation using PyTorch.

Face-uncertainty-pytorch This is a demo code of face uncertainty quantification or estimation using PyTorch. The uncertainty of face recognition is af

Kaen 3 Sep 16, 2022
A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

Guyue Hu 210 Dec 31, 2022
Person Re-identification

Person Re-identification Final project of Computer Vision Table of content Person Re-identification Table of content Students: Proposed method Dataset

Nguyễn Hoàng Quân 4 Jun 17, 2021
Pytorch0.4.1 codes for InsightFace

InsightFace_Pytorch Pytorch0.4.1 codes for InsightFace 1. Intro This repo is a reimplementation of Arcface(paper), or Insightface(github) For models,

1.5k Jan 01, 2023
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
Time Series Cross-Validation -- an extension for scikit-learn

TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini

Wenjie Zheng 222 Jan 01, 2023
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022