Machine learning Bot detection technique, based on United States election dataset

Last update: Nov 20, 2022

Related tags

Overview

Machine learning Bot detection technique, based on United States election dataset (2020).

Current github repo provides implementation described in paper: "Identification of Twitter Bots based on an Explainable ML Framework: the US 2020 Elections Case Study" accepted in ICWSM 2022 conference.

Code folder contain all scripts for:

feature selection
parameter fine tuning
final model performance evaluation

Developed Bot Detection ML framework was trained and tested over Twitter dataset.
The particular dataset was collected with use of Twitter API during the 2020 US Election period. During this period we manage to collect tweets/retweets that contain particular hashtags that are correlated with the 2020 US Election topic. We provide our dataset in the form of two separated files. In those files we provide already extracted features for each user that was labeled according to our labeling process.

data/us_2020_election_data.csv
- contain user features and labels that was active in twitter during the month of September
- we utilize it during our training/validation/testing phase of our implementation
data/us_2020_election_data_october.csv
- is utilized in order to identify how a developed model can perform over the unseen data that was collected during the month of October.

Both files was labeled with use of Botometer and BotSentinel Bot Detection tools. We keep only a union of users where both tools agree if a particular account is bot or normal user. Created pipeline allow us to keep user labels as clear as possible. We remove user ids in order to keep private the user identity, we provide only labels and extracted features which do not provide any sensitive user information.

In the plot folder we store performance metrics results in the form of figures.

Execution flow:

code/feature_selection.py
- Provide file with selected features in code folder
code/model_fine_tuning.py
- Store parameters of best model after the fine-tuning
code/performance_measurement.py
- Evaluate selected model and generates figures of ROC and PR AUCs. Also provide SHAP figure that explain the model decision making based on 20 most important features.

Developed framework provides state of the art performance over the Twitter US Election data. Based on the performance of unseen data we assume that is possible utilization of developed methods over data more than one month after training data, since the performance drop is not significant.

Machine learning Bot detection technique, based on United States election dataset

Related tags

Overview

Machine learning Bot detection technique, based on United States election dataset (2020).

Owner

Alexander Shevtsov

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

This is a work in progress reimplementation of Instant Neural Graphics Primitives

DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

Protect against subdomain takeover

基于Paddle框架的arcface复现

Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

Code for "Diffusion is All You Need for Learning on Surfaces"

This is an official implementation for "Video Swin Transformers".

Use CLIP to represent video for Retrieval Task

PyTorch implementation of the TTC algorithm

Image inpainting using Gaussian Mixture Models

This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

Official implementation of deep-multi-trajectory-based single object tracking (IEEE T-CSVT 2021).

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]