Cloud-based recommendation system

Overview

Cloud-based recommendation system

This project is based on cloud services to create data lake, ETL process, train and deploy learning model to implement a recommendation system.

Purpose

One Web app can return if the consumer will buy the product or not when providing user ID and corresponding product SKU.

Services

This project will use services:

AWS: lambda function, Step functions, Glue (job,notebook,crawler), Athena, SNS, S3, Sagemaker, IAM, Dynamodb, API Gateway.

Confluent cloud (kafka) for streaming data.

Project description

  1. Create a bucket on S3 as the storage location of the data lake, store the raw data in the bucket (raw data zone), and then return the data after ETL to the same bucket (curated zone).

  2. Preview the data, determine the data is useful and meaningful for our project. Use AWS Glue crawler to grab corresponding data catalog (in created database and generated table info). Use Athena to do SQL query. This like Apache Hive, it does not change raw data, but do operations above the raw data.

  3. Create and store stream data. Create a kafka topic on Clonfluent cloud and set schema registry for the corresponding stream data, schema sets as confluent_cloud_kafka-->confluent_kafka_topic_schema.json. Set the kafka producer as confluent_cloud_kafka-->confluent_kafka_producer_lambda.py to push stream data to corresponding kafka topic in different partitions (because this project does not have exact source giving real stream data, we produce stream data manually). Set the consumer (confluent connector with AWS lambda) as confluent_cloud_kafka-->confluent_kafka_consumer_lambda.py to poll the stream data in kafka topic and store them in Dynamodb table.

  4. ETL process. Use lambda function to do data transformation operations based on SQL, corresponding scripts in file lambda_functions(ETL). Create Glue job to integrate new dataset and store in curated zone in data lake, scripts is in glue_job-->glue_job_ETL.py. Use step fuctions to orchestrate ETL workflow based on above lambda functions, ASL script is in step_function(workflow)-->step_functions_for_curated.json.

    This part is based on spark, and it is similar with the project in repo: https://github.com/Yi-Ding111/spark-ETL-based-databricks-aws.

  5. Train learning model (XGBoost). Use sagemaker notebook instance to do some kinds more operations like: EDA and feature engineering, use XGBoost framework to train the data, adjust parameters and try different attributes combinations to find the best one. Scripts is in sagemaker-->xgboost_deploy_sagemaker.ipynb.

  6. Deploy learning model. Get deploy endpoint after machine learning. Create lambda function to invoke the sagemaker endpoint to use the trained model, scripts is in sagemaker-->endpoint_interact_lambda.py. Let the lambda function integrate with API gatway (proxy integration) as the backend. Deploy the API gatewat and use the invoked URL for web applications to do interactions.

  7. Store the application output. Use SNS to publish the output to lambda and update the information into Dynamodb table, scripts is in sagemaker-->prediction_store_dynamodb.py


Acknowledgement

This project is completed with the guidance from Leo Lee (JR academy)


Author: YI DING, Leo Lee

Created at: Dec 2021

Contact: [email protected]

Owner
Yi Ding
Yi Ding
RecList is an open source library providing behavioral, "black-box" testing for recommender systems.

RecList is an open source library providing behavioral, "black-box" testing for recommender systems.

Jacopo Tagliabue 375 Dec 30, 2022
Global Context Enhanced Social Recommendation with Hierarchical Graph Neural Networks

SR-HGNN ICDM-2020 《Global Context Enhanced Social Recommendation with Hierarchical Graph Neural Networks》 Environments python 3.8 pytorch-1.6 DGL 0.5.

xhc 9 Nov 12, 2022
Code for MB-GMN, SIGIR 2021

MB-GMN Code for MB-GMN, SIGIR 2021 For Beibei data, run python .\labcode.py For Tmall data, run python .\labcode.py --data tmall --rank 2 For IJCAI

32 Dec 04, 2022
This is our implementation of GHCF: Graph Heterogeneous Collaborative Filtering (AAAI 2021)

GHCF This is our implementation of the paper: Chong Chen, Weizhi Ma, Min Zhang, Zhaowei Wang, Xiuqiang He, Chenyang Wang, Yiqun Liu and Shaoping Ma. 2

Chong Chen 53 Dec 05, 2022
Cross-Domain Recommendation via Preference Propagation GraphNet.

PPGN Codes for CIKM 2019 paper Cross-Domain Recommendation via Preference Propagation GraphNet. Citation Please cite our paper if you find this code u

Information Retrieval Group, Wuhan University, China 20 Dec 15, 2022
Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation This repository contains the source code of the SIGIR 2019 paper "Reinforcement

Yikun Xian 197 Dec 28, 2022
Attentive Social Recommendation: Towards User And Item Diversities

ASR This is a Tensorflow implementation of the paper: Attentive Social Recommendation: Towards User And Item Diversities Preprint, https://arxiv.org/a

Dongsheng Luo 1 Nov 14, 2021
It is a movie recommender web application which is developed using the Python.

Movie Recommendation 🍿 System Watch Tutorial for this project Source IMDB Movie 5000 Dataset Inspired from this original repository. Features Simple

Kushal Bhavsar 10 Dec 26, 2022
6002project-rl - An implemention of offline RL on recommender system

An implemention of offline RL on recommender system @author: misajie @update: 20

Tzay Lee 3 May 24, 2022
Knowledge-aware Coupled Graph Neural Network for Social Recommendation

KCGN AAAI-2021 《Knowledge-aware Coupled Graph Neural Network for Social Recommendation》 Environments python 3.8 pytorch-1.6 DGL 0.5.3 (https://github.

xhc 22 Nov 18, 2022
Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022
Pytorch domain library for recommendation systems

TorchRec (Experimental Release) TorchRec is a PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale

Meta Research 1.3k Jan 05, 2023
Graph Neural Network based Social Recommendation Model. SIGIR2019.

Basic Information: This code is released for the papers: Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang and Meng Wang. A Neural Influence Dif

PeijieSun 144 Dec 29, 2022
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Recommendation engines are one of the most well known, widely used and highest value use cases for applying machine learning. Despite this, while there are many resources available for the basics of

International Business Machines 793 Dec 18, 2022
A Library for Field-aware Factorization Machines

Table of Contents ================= - What is LIBFFM - Overfitting and Early Stopping - Installation - Data Format - Command Line Usage - Examples -

1.6k Dec 05, 2022
A tensorflow implementation of the RecoGCN model in a CIKM'19 paper, titled with "Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation".

This repo contains a tensorflow implementation of RecoGCN and the experiment dataset Running the RecoGCN model python train.py Example training outp

xfl15 30 Nov 25, 2022
Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer

Introduction This is the repository of our accepted CIKM 2021 paper "Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Trans

SeqRec 29 Dec 09, 2022
Bundle Graph Convolutional Network

Bundle Graph Convolutional Network This is our Pytorch implementation for the paper: Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin and Yong Li. Bun

55 Dec 25, 2022
A Python scikit for building and analyzing recommender systems

Overview Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data. Surprise was designed with th

Nicolas Hug 5.7k Jan 01, 2023
Graph Neural Networks for Recommender Systems

This repository contains code to train and test GNN models for recommendation, mainly using the Deep Graph Library (DGL).

217 Jan 04, 2023