A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

Cohort Intelligence used to solve various mathematical functions

Banpei is a Python package of the anomaly detection.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Tools for Optuna, MLflow and the integration of both.

Tribuo - A Java machine learning library

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

moDel Agnostic Language for Exploration and eXplanation

Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Python module for performing linear regression for data with measurement errors and intrinsic scatter

using Machine Learning Algorithm to classification AppleStore application

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

A python library for easy manipulation and forecasting of time series.

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Climin is a Python package for optimization, heavily biased to machine learning scenarios

GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3

Winning solution for the Galaxy Challenge on Kaggle

A python fast implementation of the famous SVD algorithm popularized by Simon Funk during Netflix Prize

A Tools that help Data Scientists and ML engineers train and deploy ML models.

Test symmetries with sklearn decision tree models