Malware-Related Sentence Classification

This repo contains the code for the ICTAI 2021 paper "Enrichment of Features for Malware-Related Sentence Classification using External Knowledge".

Installation

Installation from the source. Python's virtual or Conda environments are recommended.

git clone https://github.com/chaumng/malware_related_sentence_classification.git
cd malware_related_sentence_classification
pip install -r requirements.txt

This repo is tested on Python 3.7.

Classification and Evaluation

Preprocess data

python preprocess_data.py

Parameter searching: Classify and evaluate

In this repo, we already provided the GAT weak labels in a file. To perform parameter searching, run the following command. The default value is to perform the second grid search. You can change the value of the argument param_grid_setting to "first_grid_search" perform the first grid search, or to "best_setting" to run only the best setting.

python svm_param_search.py --param_grid_setting second_grid_search

Citation

If you find this paper or this code useful, please cite this paper:

@inproceedings{chaunguyen_et_al_2021,
  title={Enrichment of Features for Malware-Related Sentence Classification using External Knowledge},
  author={Nguyen, Chau and Tran, Vu and Nguyen, Le Minh},
  booktitle={Proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI)},
  year={2021},
  organization={IEEE},
}

Malware-Related Sentence Classification

Related tags

Overview

Malware-Related Sentence Classification

Installation

Classification and Evaluation

Preprocess data

Parameter searching: Classify and evaluate

Citation

Owner

Chau Nguyen

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Host your own GPT-3 Discord bot

Plugin repository for Macast

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

Code for the paper "Are Sixteen Heads Really Better than One?"

Code for Emergent Translation in Multi-Agent Communication

DeepPavlov Tutorials

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Reproduction process of BERT on SST2 dataset

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Long text token classification using LongFormer

A telegram bot to translate 100+ Languages

Python3 to Crystal Translation using Python AST Walker

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Azure Text-to-speech service for Home Assistant