[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Last update: Dec 08, 2022

Related tags

Overview

On Sampling Collaborative Filtering Datasets

This repository contains the implementation of many popular sampling strategies, along with various explicit/implicit/sequential feedback recommendation algorithms. The code accompanies the paper "On Sampling Collaborative Filtering Datasets" [ACM] [Public PDF] where we compare the utility of different sampling strategies for preserving the performance of various recommendation algorithms.

We also provide code for Data-Genie which can automatically predict the performance of how good any sampling strategy will be for a given collaborative filtering dataset. We refer the reader to the full paper for more details. Kindly send me an email if you're interested in obtaining access to the pre-trained weights of Data-Genie.

If you find any module of this repository helpful for your own research, please consider citing the below WSDM'22 paper. Thanks!

@inproceedings{sampling_cf,
  author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
  title = {On Sampling Collaborative Filtering Datasets},
  url = {https://doi.org/10.1145/3488560.3498439},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  series = {WSDM '22},
  year = {2022}
}

Code Author: Noveen Sachdeva ([email protected])

Setup

Environment Setup

$ pip install -r requirements.txt

Data Setup

Once you've correctly setup the python environments and downloaded the dataset of your choice (Amazon: http://jmcauley.ucsd.edu/data/amazon/), the following steps need to be run:

The following command will create the required data/experiment directories as well as download & preprocess the Amazon magazine and the MovieLens-100K datasets. Feel free to download more datasets from the following web-page http://jmcauley.ucsd.edu/data/amazon/ and adjust the setup.sh and preprocess.py files accordingly.

$ ./setup.sh

How to train a model on a sampled/complete CF-dataset?

Edit the hyper_params.py file which lists all config parameters, including what type of model to run. Currently supported models:

Sampling Strategy	What is sampled?	Paper Link
Random	Interactions
Stratified	Interactions
Temporal	Interactions
SVP-CF w/ MF	Interactions	LINK & LINK
SVP-CF w/ Bias-only	Interactions	LINK & LINK
SVP-CF-Prop w/ MF	Interactions	LINK & LINK
SVP-CF-Prop w/ Bias-only	Interactions	LINK & LINK
Random	Users
Head	Users
SVP-CF w/ MF	Users	LINK & LINK
SVP-CF w/ Bias-only	Users	LINK & LINK
SVP-CF-Prop w/ MF	Users	LINK & LINK
SVP-CF-Prop w/ Bias-only	Users	LINK & LINK
Centrality	Graph	LINK
Random-Walk	Graph	LINK
Forest-Fire	Graph	LINK

Finally, type the following command to run:

$ CUDA_VISIBLE_DEVICES=<SOME_GPU_ID> python main.py

Alternatively, to train various possible recommendation algorithm on various CF datasets/subsets, please edit the configuration in grid_search.py and then run:

$ python grid_search.py

How to train Data-Genie?

Edit the data_genie/data_genie_config.py file which lists all config parameters, including what datasets/CF-scenarios/samplers etc. to train Data-Genie on
Finally, use the following command to train Data-Genie:

$ python data_genie.py

License

MIT

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Related tags

Overview

On Sampling Collaborative Filtering Datasets

Setup

Environment Setup

Data Setup

How to train a model on a sampled/complete CF-dataset?

How to train Data-Genie?

License

Owner

Noveen Sachdeva

Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Code for AutoNL on ImageNet (CVPR2020)

Vector.ai assignment

ADOP: Approximate Differentiable One-Pixel Point Rendering

PyTorch implementation of SQN based on CloserLook3D's encoder

A paper using optimal transport to solve the graph matching problem.

Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

Codes for the compilation and visualization examples to the HIF vegetation dataset

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

This is the latest version of the PULP SDK

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Revisiting Self-Training for Few-Shot Learning of Language Model.

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification