Sequence-tagging using deep learning

Overview

Classification using Deep Learning

Requirements

  • PyTorch version >= 1.9.1+cu111
  • Python version >= 3.8.10
  • PyTorch-Lightning version >= 1.4.9
  • Huggingface Transformers version >= 4.11.3
  • Tensorboard version >= 2.6.0
  • Pandas >= 1.3.4
  • Scikit-learn: numpy>=1.14.6, scipy>=1.1.0, threadpoolctl>=2.0.0, joblib>=0.11

Installation

pip3 install transformers
pip3 install pytorch-lightning
pip3 install tensorboard
pip3 install pandas
pip3 install scikit-learn
git clone https://github.com/vineetk1/clss.git
cd clss

Note that the default directory is clss. Unless otherwise stated, all commands from the Command-Line-Interface must be delivered from the default directory.

Download the dataset

  1. Create a data directory.
mkdir data
  1. Download a dataset in the data directory.

Saving all informtion and results of an experiment

All information about the experiment is stored in a unique directory whose path starts with tensorboard_logs and ends with a unique version-number. Its contents consist of hparams.yaml, hyperperameters_used.yaml, test-results.txt, events.* files, and a checkpoints directory that has one or more checkpoint-files.

Train, validate, and test a model

Following command trains a model, saves the last checkpoint plus checkpoints that have the lowest validation loss, runs the test dataset on the checkpointed model with the lowest validation loss, and outputs the results of the test:

python3 Main.py input_param_files/bert_seq_class

The user-settable hyper-parameters are in the file input_param_files/bert_seq_class. An explanation on the contents of this file is at input_param_files/README.md. A list of all the hyper-parameters is in the PyTorch-Lightning documentation, and any hyper-parameter can be used.
To assist in Training, the two parameters auto_lr_find and auto_scale_batch_size in the file input_param_files/bert_seq_class enable the software to automatically find an initial Learning-Rate and a Batch-Size respectively.
As training progresses, graphs of "training-loss vs. epoch #", "validation-loss vs. epoch #", and "learning-rate vs. batch #" are plotted in real-time on the TensorBoard. Training is stopped by typing, at the Command-Line-Interface, the keystroke ctrl-c. The current training information is checkpointed, and training stops. Training can be resumed, at some future time, from the checkpointed file.
Dueing testing, the results are sent to the standard-output, and also saved in the *test-results.txt" file that include the following: general information about the dataset and the classes, confusion matrix, precision, recall, f1, average f1, and weighted f1.

Resume training, validation, and testing a model with same hyper-parameters

Resume training a checkpoint model with the same model- and training-states by using the following command:

python3 Main.py input_param_files/bert_seq_class-res_from_chkpt

The user-settable hyper-parameters are in the file input_param_files/bert_seq_class-res_from_chkpt. An explanation on the contents of this file is at input_param_files/README.md.

Change hyper-parameters and continue training, validation, and testing a model

Continue training a checkpoint model with the same model-state but different hyperparameters for the training-state by using the following command:

python3 Main.py input_param_files/bert_seq_class-ld_chkpt

The user-settable hyper-parameters are in the file input_param_filesbert_seq_class-ld_chkpt. An explanation on the contents of this file is at input_param_files/README.md.

Further test a checkpoint model with a new dataset

Test a checkpoint model by using the following command:

python3 Main.py input_param_files/bert_seq_class-ld_chkpt_and_test

The user-settable hyper-parameters are in the file input_param_files/bert_seq_class-ld_chkpt_and_test. An explanation on the contents of this file is at input_param_files/README.md.

Owner
Vineet Kumar
Vineet Kumar
A collection of scripts I developed for personal and working projects.

A collection of scripts I developed for personal and working projects Table of contents Introduction Repository diagram structure List of scripts pyth

Gianluca Bianco 109 Dec 26, 2022
Get started with Machine Learning with Python - An introduction with Python programming examples

Machine Learning With Python Get started with Machine Learning with Python An engaging introduction to Machine Learning with Python TL;DR Download all

Learn Python with Rune 130 Jan 02, 2023
Towards the D-Optimal Online Experiment Design for Recommender Selection (KDD 2021)

Towards the D-Optimal Online Experiment Design for Recommender Selection (KDD 2021) Contact 0 Jan 11, 2022

Can we learn gradients by Hamiltonian Neural Networks?

Can we learn gradients by Hamiltonian Neural Networks? This project was carried out as part of the Optimization for Machine Learning course (CS-439) a

2 Aug 22, 2022
Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation By: Zayd Hammoudeh and Daniel Lowd Paper: Arxiv Preprint Coming soo

Zayd Hammoudeh 2 Oct 08, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

Agustinus Kristiadi 4 Dec 26, 2021
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
Realtime YOLO Monster Detection With Non Maximum Supression

Realtime-YOLO-Monster-Detection-With-Non-Maximum-Supression Table of Contents In

5 Oct 07, 2022
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You 93 Dec 25, 2022
Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

RedliningExploration The Google Colaboratory file contained in this repository contains work inspired by a project on educational inequality in the Ph

Benjamin Warren 1 Jan 20, 2022
Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

Sihyun Yu 147 Dec 31, 2022
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
Unsupervised Image Generation with Infinite Generative Adversarial Networks

Unsupervised Image Generation with Infinite Generative Adversarial Networks Here is the implementation of MICGANs using DCGAN architecture on MNIST da

16 Dec 24, 2021
graph-theoretic framework for robust pairwise data association

CLIPPER: A Graph-Theoretic Framework for Robust Data Association Data association is a fundamental problem in robotics and autonomy. CLIPPER provides

MIT Aerospace Controls Laboratory 118 Dec 28, 2022
A stock generator that assess a list of stocks and returns the best stocks for investing and money allocations based on users choices of volatility, duration and number of stocks

Stock-Generator Please visit "Stock Generator.ipynb" for a clearer view and "Stock Generator.py" for scripts. The stock generator is designed to allow

jmengnyay 1 Aug 02, 2022
Import Python modules from dicts and JSON formatted documents.

Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important

Wojciech Wentland 1 Sep 07, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022