Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Overview

Real-time stock predictions with deep learning and news scraping

This repository contains a partial implementation of my bachelor's thesis "Real-time stock predictions with deep learning and news scraping". The code has been built using PyTorch Lightning, read its documentation to get a complete overview of how this repository is structured.

Disclaimer: Neither the pipeline nor the model published in this repository are the ones used in the thesis. On the pipeline side, notice that the model tries to match headlines and prices of the same day, while in the thesis we used news published the day before. For the case of the model, the one shared here has nothing to do with the original and should be considered a toy model.

Preparing the data

The data used in the thesis has been completely crawled and put together from scratch. Specifically, you can find the titles and descriptions of the news published on Reuters.com from January 2010 to May 2018. In addition to that, you also have the stock prices (end of the day) of S&P 500 companies extracted from AlphaVantage.co. Everything is compressed in a H5DF file that you can download from this link.

The first step is to clone this repository and install its dependencies:

git clone https://github.com/davidalvarezdlt/bachelor_thesis.git
cd bachelor_thesis
pip install -r requirements.txt

Move both bachelor_thesis_data.hdf5 and word2vec.bin inside ./data. The resulting folder structure should look like this:

bachelor_thesis/
    bachelor_thesis/
    data/
        bachelor_thesis_data.hdf5
        word2vec.bin
    lightning_logs/
    .gitignore
    .pre-commit-config.yaml
    LICENSE
    README.md
    requirements.txt

Training the model

In short, you can train the model by calling:

python -m bachelor_thesis

You can modify the default parameters of the code by using CLI parameters. Get a complete list of the available parameters by calling:

python -m bachelor_thesis --help

For instance, if we want to train the model using GOOGL stock prices, with a batch size of 32 and using one GPUs, we would call:

python -m bachelor_thesis --symbol GOOGL --batch_size 32 --gpus 1

Every time you train the model, a new folder inside ./lightning_logs will be created. Each folder represents a different version of the model, containing its checkpoints and auxiliary files.

Testing the model

You can measure the loss and the accuracy of the model (number of times the prediction is correct) and store it in TensorBoard by calling:

python -m bachelor_thesis --test --test_checkpoint <test_checkpoint>

Where --test_checkpoint is a valid path to the model checkpoint that should be used.

Citation

If you use the data provided in this repository or if you find this thesis useful, please use the following citation:

@thesis{Alvarez2018,
    type = {Bachelor's Thesis},
    author = {David Álvarez de la Torre},
    title = {Real-time stock predictions with Deep Learning and news scrapping},
    school = {Universitat Politècnica de Catalunya},
    year = 2018,
}
Owner
David Álvarez de la Torre
Founder of @lemonplot. Alumni of UPC and ETH.
David Álvarez de la Torre
Constructing Neural Network-Based Models for Simulating Dynamical Systems

Constructing Neural Network-Based Models for Simulating Dynamical Systems Note this repo is work in progress prior to reviewing This is a companion re

Christian Møldrup Legaard 21 Nov 25, 2022
Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Stefano Peluchetti 109 Nov 26, 2022
PECOS - Prediction for Enormous and Correlated Spaces

PECOS - Predictions for Enormous and Correlated Output Spaces PECOS is a versatile and modular machine learning (ML) framework for fast learning and i

Amazon 387 Jan 04, 2023
Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Transferable Semantic Augmentation for Domain Adaptation Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021) Paper

66 Dec 16, 2022
FSL-Mate: A collection of resources for few-shot learning (FSL).

FSL-Mate is a collection of resources for few-shot learning (FSL). In particular, FSL-Mate currently contains FewShotPapers: a paper list which tracks

Yaqing Wang 1.5k Jan 08, 2023
SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

SimulEval SimulEval is a general evaluation framework for simultaneous translation on text and speech. Requirement python = 3.7.0 Installation git cl

Facebook Research 48 Dec 28, 2022
Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Graph-Based Local Trajectory Planner The graph-based local trajectory planner is python-based and comes with open interfaces as well as debug, visuali

TUM - Institute of Automotive Technology 160 Jan 04, 2023
PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability PCACE is a new algorithm for ranking neurons in a CNN architecture in order

4 Jan 04, 2022
The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp.

PISE The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp. Requirement conda create -n pise pyt

jinszhang 110 Nov 21, 2022
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

2021: A Year Full of Amazing AI papers- A Review 📌 A curated list of the latest breakthroughs in AI by release date with a clear video explanation, l

Louis-François Bouchard 2.9k Dec 31, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022
The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

RegSeg The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation" Paper: arxiv D block Decoder Setup Install the

Roland 61 Dec 27, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
An end-to-end framework for mixed-integer optimization with data-driven learned constraints.

OptiCL OptiCL is an end-to-end framework for mixed-integer optimization (MIO) with data-driven learned constraints. We address a problem setting in wh

Holly Wiberg 57 Dec 26, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

AstraZeneca - Molecular AI 72 Jan 02, 2023
Layer 7 DDoS Panel with Cloudflare Bypass ( UAM, CAPTCHA, BFM, etc.. )

Blood Deluxe DDoS DDoS Attack Panel includes CloudFlare Bypass (UAM, CAPTCHA, BFM, etc..)(It works intermittently. Working on it) Don't attack any web

272 Nov 01, 2022
Disagreement-Regularized Imitation Learning

Due to a normalization bug the expert trajectories have lower performance than the rl_baseline_zoo reported experts. Please see the following link in

Kianté Brantley 25 Apr 28, 2022
A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from

2 May 09, 2022