NLP - Machine learning

Overview

Flipkart-product-reviews

NLP - Machine learning

Visual Studio Code

About


Product reviews is an essential part of an online store like Flipkart’s branding and marketing. They help to build trust and loyalty and typically describe what sets your product apart from others. Savvy shoppers almost never purchase a product without knowing how it’s going to work for them. The more reviews a platform has, the more convinced a user will be that he/she is making the right decision.

Online reviews are very important to e-commerce businesses because they ultimately increase sales by giving the consumers the information they need to make the decision to purchase the product. One other important factor in elevating the reputation, standard, and evaluation of an e-commerce store is product rating.

NLP


Natural Language Processing (NLP) helps machines “read” text by simulating the human ability to understand language. It is a field of Artificial Intelligence that gives machines the ability to read, understand and derive meaning from human languages.

Visual Studio Code

#1 Tokenization


Tokenization is the process of breaking down sentence or paragraphs into smaller chunks of words called tokens.

#2 Stop Words Removal


On removal of some words, the meaning of the sentence doesn't change, like and, am. Those words are called stop-words and should be removed before feeding to any algorithm. In datasets, some non-stop words repeat very frequently. Those words too should be removed to get an unbiased result from the algorithm.

#3 Vectorization


After tokenization, and stop words removal, our "content" are still in string format. We need to convert those strings to numbers based on their importance (features). We use TF-IDF vectorization to convert those text to vector of importance. With TF-IDF we can extract important words in our data. It assign rarely occurring words a high number, and frequently occurring words a very low number.

Topic Modelling - LDA

Visual Studio Code


Topic modeling in python involves counting words and grouping similar word patterns to infer topics within unstructured data. Let’s take the example of Flipkart where you might want to know what customers are saying about a particular product from x seller. Instead of spending hours to find out the best-reviewed product through heaps of feedback, you can analyze them with a topic modeling algorithm.

By detecting patterns such as word frequency and distance between words, a topic model clusters feedback that is similar, and words and expressions that appear most often. With this information, you can quickly deduce what each set of texts are talking about.

There are various topic in modelling algorithms, we will be using the Latent Dirichlet Allocation algorithm(LDA).

Owner
Harshith VH
Machine Learning, Deep Learning & Data Science Enthusiast 🚀
Harshith VH
Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

texttron 193 Jan 04, 2023
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
A collection of models for image - text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022
Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

UlionTse 907 Dec 27, 2022
GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

Nuked 1 Aug 21, 2022
Conditional probing: measuring usable information beyond a baseline

Conditional probing: measuring usable information beyond a baseline

John Hewitt 20 Dec 15, 2022
Indonesia spellchecker with python

indonesia-spellchecker Ganti kata yang terdapat pada file teks.txt untuk diperiksa kebenaran kata. Run on local machine python3 main.py

Rahmat Agung Julians 1 Sep 14, 2022
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 39 Dec 14, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

Jia Chen 17 Nov 09, 2022
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Bethge Lab 61 Dec 21, 2022
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
A music comments dataset, containing 39,051 comments for 27,384 songs.

Music Comments Dataset A music comments dataset, containing 39,051 comments for 27,384 songs. For academic research use only. Introduction This datase

Zhang Yixiao 2 Jan 10, 2022
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
A Streamlit web app that generates Rick and Morty stories using GPT2.

Rick and Morty Story Generator This project uses a pre-trained GPT2 model, which was fine-tuned on Rick and Morty transcripts, to generate new stories

₸ornike 33 Oct 13, 2022
Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

Vishnu Nandakumar 13 Dec 21, 2022
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT This is the code base for the paper DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. Code in this repository is also available

Castorini 132 Nov 14, 2022
Code voor mijn Master project omtrent VideoBERT

Code voor masterproef Deze repository bevat de code voor het project van mijn masterproef omtrent VideoBERT. De code in deze repository is gebaseerd o

35 Oct 18, 2021
1 Jun 28, 2022