Sentiment Analysis Project

This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The training models for this Machine Learning project are built through Count Vectorizer (for the countvectorizer.py program) and TF-IDF Vectorizer (for the tdidf.py program). You can see the difference in implementation and accuracy results through both types of Vectorizers by running the programs separately (usually, TF-IDF Vectorizer is considered more accurate).

System Requirements

Use the pip install command to install the following imports:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier

Usage (description of actions performed)

1. dataset imported
2. null values deleted
3. 30% representative sample is taken to avoid slow down of system
4. sentiments column added
5. input training features and labels defined
6. dataset split into training sets and testing sets
7. text data vectorizer (using CountVectorizer or TF-IDF Vectorizer)
8. models trained:
 -  Logistic Regression (linear clasification)
 -  Support Vector Machine (linear/non-linear data separated into classes by a line/hyperplane)
 -  K Nearest Neighbor (local approximation)
9. print Accuracy Scores, Confusion Matrix, Ture Positive and Negative Rates for all three models

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Related tags

Overview

Sentiment Analysis Project

System Requirements

Usage (description of actions performed)

Contributing

License

Owner

Simran Farrukh

原神抽卡记录数据集-Genshin Impact gacha data

Python library for interactive topic model visualization. Port of the R LDAvis package.

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

多语言降噪预训练模型MBart的中文生成任务

Code Generation using a large neural network called GPT-J

Autoregressive Entity Retrieval

The Sudachi synonym dictionary in Solar format.

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

Black for Python docstrings and reStructuredText (rst).

Machine learning classifiers to predict American Sign Language .

Kurumi ChatBot

Official PyTorch implementation of SegFormer

Transformers and related deep network architectures are summarized and implemented here.

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition