GAN-based Matrix Factorization for Recommender Systems

Overview

GAN-based Matrix Factorization for Recommender Systems

GANMF architecture

This repository contains the datasets' splits, the source code of the experiments and their results for the paper "GAN-based Matrix Factorization for Recommender Systems" (arXiv: https://arxiv.org/abs/2201.08042) accepted at the 37th ACM/SIGAPP Symposium on Applied Computing (SAC '22).

How to use this repo

This repo is based on a version of Recsys_Course_AT_PoliMi. In order to run the code and experiments you need first to setup a Python environment. Any environment manager will work, but we suggest conda since it is easier to recreate our environment if using a GPU. conda can help with the installation of CUDA and CUDA toolkit necessary to utilize available GPU(s). We highly recommend running this repo with a GPU since GAN-based recommenders require long training times.

Conda

Run the following command to create a new environment with Python 3.6.8 and install all requirements in file conda_requirements.txt:

conda create -n <name-env> python==3.6.8 --file conda_requirements.txt

The file conda_requirements.txt also contains the packages cudatoolkit==9.0 and cudnn==7.1.2 which are installed completely separate from other versions you might already have installed and are managed by conda.

Install the following packages using pip inside the newly created environment since they are not found in the main channel of conda and conda-forge channel holds old versions of them:

pip install scikit-optimize==0.7.2 telegram-send==0.25

Activate the newly created environment:

conda activate <name-env>

Virtualenv & Pip

First download and install Python 3.6.8 from python.org. Then install virtualenv:

python -m pip install --user virtualenv

Now create a new environment with virtualenv (by default it will use the Python version it was installed with):

virtualenv <name-env> <path-to-new-env>

Activate the new environment with:

source <path-to-new-env>/bin/activate

Install the required packages through the file pip_requirements.txt:

pip install -r pip_requirements.txt

Note that if you intend to use a GPU and install required packages using virtualenv and pip then you need to install separately cudatoolkit==9.0 and cudnn==7.1.2 following instructions for your GPU on nvidia.com.

Before running any experiment or algorithm you need to compile the Cython code part of some of the recommenders. You can compile them all with the following command:

python run_compile_all_cython.py

N.B You need to have the following packages installed before compiling: gcc and python3-dev.

N.B Since the experiments can take a long time, the code notifies you on your Telegram account when the experiments start/end. Either configure telegram-send as indicated on https://pypi.org/project/telegram-send/#installation or delete the lines containing telegram-send inside RecSysExp.py.


Running experiments

All results presented in the paper are already provided in this repository. In case you want to re-run the experiments, below you can find the steps for each one of them.

Comparison with baselines1

In order to run all the comparisons with the baselines use the file RecSysExp.py. First compute for each dataset the 5 mutually exclusive sets:

  • Training set: once best hyperparameters of the recommender are found, it will be finally trained with this set.

    • Training set small: the recommender is first trained on this small training set with the aim of finding the best hyperparameters.
    • Early stopping set: validation set used to incorporate early stopping in the hyperparameters tuning.
    • Validation set: the recommender with the current hyperparameter values is tested against this set.
  • Test set: once the best hyperparameters are found, the recommender is finally tested with this set. The results presented are the ones on this set.

Compute the splits for each dataset with the following command:

python RecSysExp.py --build-dataset <dataset-name>

To run the tuning of a recommender use the following command:

python RecSysExp.py <dataset-name> <recommender-name> [--user | --item] [<similarity-type>] 
  • dataset-name is a value among: 1M, hetrec2011, LastFM.
  • recommender-name is a value among: TopPop, PureSVD, ALS, SLIMBPR, ItemKNN, P3Alpha, CAAE, CFGAN, GANMF.
  • --user or --item is a flag used only for GAN-based recommenders. It denotes the user/item-based training procedure for the selected recommender.
  • similarity-type is a value among: cosine, jaccard, tversky, dice, euclidean, asymmetric. It is used only for ItemKNN recommender.

All results, best hyperparameters and dataset splits are saved in the experiments directory.


Testing on test set with best hyperparameters

In order to test each tuned recommender on the test set (which is created when tuning the hyperparameters) run the following command:

python RunBestParameters.py <dataset-name> <recommender-name> [--user | --item] [<similarity-type>] [--force] [--bp <best-params-dir>]
  • dataset-name is a value among: 1M, hetrec2011, LastFM.
  • recommender-name is a value among: TopPop, PureSVD, ALS, SLIMBPR, ItemKNN, P3Alpha, CAAE, CFGAN, GANMF.
  • --user or --item is a flag used only for GAN-based recommenders. It denotes the user/item based training procedure for the selected recommender.
  • similarity-type is a value among: cosine, jaccard, tversky, dice, euclidean, asymmetric. It is used only for ItemKNN recommender.
  • --force is a flag that forces the computation of the results on test set. By default, if the result for the tuple (dataset, recommender) exists in test_result directory, the computation is not performed.
  • --bp sets the directory where the best parameters (best_params.pkl) are located for this combination of (dataset, recommender), by default in experiments directory.

The results are saved in the test_results directory.


Ablation study

To run the ablation study, use the script AblationStudy.py as follows:

python AblationStudy.py <dataset-name> [binGANMF | feature-matching [--user | --item]]
  • dataset-name is a value among: 1M, hetrec2011, LastFM.
  • binGANMF runs the first ablation study, the GANMF model with binary classifier discrimnator. This tunes the recommender with RecSysExp.py and then evaluates it with RunBestParameters.py on the test set.
  • --user or --item is a flag that sets the training procedure for binGANMF recommender.
  • feature-matching runs the second ablation study, the effect of the feature matching loss and the user-user similarity heatmaps. The results are saved in the feature_matching directory.

MF model of GANMF

To run the qualitative study on the MF learned by GANMF, use the script MFLearned.py as follows:

python MFLearned.py

It executes both experiments and the results are saved in the latent_factors directory.

Footnotes

  1. For the baselines Top Popular, PureSVD, ALS, SLIMBPR, ItemKNN, P3Alpha and model evaluation we have used implementations from Recsys_Course_AT_PoliMi.

Owner
Ervin Dervishaj
Interested in Recommender Systems and Machine/Deep Learning research
Ervin Dervishaj
A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

Geoff Pleiss 1.4k Dec 25, 2022
Simple node deletion tool for onnx.

snd4onnx Simple node deletion tool for onnx. I only test very miscellaneous and limited patterns as a hobby. There are probably a large number of bugs

Katsuya Hyodo 6 May 15, 2022
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
Simulation of Self Driving Car

In this repository, the code to use Udacity's self driving car simulator as a testbed for training an autonomous car are provided.

Shyam Das Shrestha 1 Nov 21, 2021
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao. Noisy-Labeled NER with Confidence Estimation. NAACL 2021. [arxiv]

30 Nov 12, 2022
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

Gabriel Nützi 390 Dec 31, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
E-Ink Magic Calendar that automatically syncs to Google Calendar and runs off a battery powered Raspberry Pi Zero

MagInkCal This repo contains the code needed to drive an E-Ink Magic Calendar that uses a battery powered (PiSugar2) Raspberry Pi Zero WH to retrieve

2.8k Dec 28, 2022
J.A.R.V.I.S is an AI virtual assistant made in python.

J.A.R.V.I.S is an AI virtual assistant made in python. Running JARVIS Without Python To run JARVIS without python: 1. Head over to our installation pa

somePythonProgrammer 16 Dec 29, 2022
Playing around with FastAPI and streamlit to create a YoloV5 object detector

FastAPI-Streamlit-based-YoloV5-detector Playing around with FastAPI and streamlit to create a YoloV5 object detector It turns out that a User Interfac

2 Jan 20, 2022
The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".

Kernelized-HRM Jiashuo Liu, Zheyuan Hu The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization"[1]. This repo contains the cod

Liu Jiashuo 8 Nov 20, 2022
Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

Chibueze Henry 6 Mar 16, 2022
Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation This repository contains the Pytorch implementation of the proposed

Devavrat Tomar 19 Nov 10, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
DilatedNet in Keras for image segmentation

Keras implementation of DilatedNet for semantic segmentation A native Keras implementation of semantic segmentation according to Multi-Scale Context A

303 Mar 15, 2022
Mini Software that give reminder to drink water as per your weight.

Water Notification Desktop Python The Mini Software built in Python (tkinter) that will remind you to drink water on specific time span based on your

Om Jogani 5 Dec 16, 2022
Zalo AI challenge 2021 task hum to song

Zalo AI challenge 2021 task Hum to Song pipeline: Chuẩn bị dữ liệu cho quá trình train: Sửa các file đường dẫn trong config/preprocess.yaml raw_path:

Vo Van Phuc 105 Dec 16, 2022
Shuffle Attention for MobileNetV3

SA-MobileNetV3 Shuffle Attention for MobileNetV3 Train Run the following command for train model on your own dataset: python train.py --dataset mnist

Sajjad Aemmi 36 Dec 28, 2022