Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Last update: Jan 18, 2022

Related tags

Deep Learning IMDB-Success-Predictor

Overview

IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Related tags

Overview

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

Owner

Gautam Diwan

This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

Training data extraction on GPT-2

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

StackNet is a computational, scalable and analytical Meta modelling framework

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Code for "Adversarial attack by dropping information." (ICCV 2021)

Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

Code for Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

kullanışlı ve işinizi kolaylaştıracak bir araç

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

The devkit of the nuScenes dataset.

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)