[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Overview

On Sampling Collaborative Filtering Datasets

This repository contains the implementation of many popular sampling strategies, along with various explicit/implicit/sequential feedback recommendation algorithms. The code accompanies the paper "On Sampling Collaborative Filtering Datasets" [ACM] [Public PDF] where we compare the utility of different sampling strategies for preserving the performance of various recommendation algorithms.

We also provide code for Data-Genie which can automatically predict the performance of how good any sampling strategy will be for a given collaborative filtering dataset. We refer the reader to the full paper for more details. Kindly send me an email if you're interested in obtaining access to the pre-trained weights of Data-Genie.

If you find any module of this repository helpful for your own research, please consider citing the below WSDM'22 paper. Thanks!

@inproceedings{sampling_cf,
  author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
  title = {On Sampling Collaborative Filtering Datasets},
  url = {https://doi.org/10.1145/3488560.3498439},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  series = {WSDM '22},
  year = {2022}
}

Code Author: Noveen Sachdeva ([email protected])


Setup

Environment Setup
$ pip install -r requirements.txt
Data Setup

Once you've correctly setup the python environments and downloaded the dataset of your choice (Amazon: http://jmcauley.ucsd.edu/data/amazon/), the following steps need to be run:

The following command will create the required data/experiment directories as well as download & preprocess the Amazon magazine and the MovieLens-100K datasets. Feel free to download more datasets from the following web-page http://jmcauley.ucsd.edu/data/amazon/ and adjust the setup.sh and preprocess.py files accordingly.

$ ./setup.sh

How to train a model on a sampled/complete CF-dataset?

  • Edit the hyper_params.py file which lists all config parameters, including what type of model to run. Currently supported models:
Sampling Strategy What is sampled? Paper Link
Random Interactions
Stratified Interactions
Temporal Interactions
SVP-CF w/ MF Interactions LINK & LINK
SVP-CF w/ Bias-only Interactions LINK & LINK
SVP-CF-Prop w/ MF Interactions LINK & LINK
SVP-CF-Prop w/ Bias-only Interactions LINK & LINK
Random Users
Head Users
SVP-CF w/ MF Users LINK & LINK
SVP-CF w/ Bias-only Users LINK & LINK
SVP-CF-Prop w/ MF Users LINK & LINK
SVP-CF-Prop w/ Bias-only Users LINK & LINK
Centrality Graph LINK
Random-Walk Graph LINK
Forest-Fire Graph LINK
  • Finally, type the following command to run:
$ CUDA_VISIBLE_DEVICES=<SOME_GPU_ID> python main.py
  • Alternatively, to train various possible recommendation algorithm on various CF datasets/subsets, please edit the configuration in grid_search.py and then run:
$ python grid_search.py

How to train Data-Genie?

  • Edit the data_genie/data_genie_config.py file which lists all config parameters, including what datasets/CF-scenarios/samplers etc. to train Data-Genie on

  • Finally, use the following command to train Data-Genie:

$ python data_genie.py

License


MIT

Owner
Noveen Sachdeva
CS PhD Student | Machine Learning Researcher
Noveen Sachdeva
Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Multi-Time Attention Networks (mTANs) This repository contains the PyTorch implementation for the paper Multi-Time Attention Networks for Irregularly

The Laboratory for Robust and Efficient Machine Learning 68 Dec 17, 2022
Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative

Tao Xu 1.2k Dec 26, 2022
Code for AutoNL on ImageNet (CVPR2020)

Neural Architecture Search for Lightweight Non-Local Networks This repository contains the code for CVPR 2020 paper Neural Architecture Search for Lig

Yingwei Li 104 Aug 31, 2022
Vector.ai assignment

fabio-tests-nisargatman Low Level Approach: ###Tables: continents: id*, name, population, area, createdAt, updatedAt countries: id*, name, population,

Ravi Pullagurla 1 Nov 09, 2021
ADOP: Approximate Differentiable One-Pixel Point Rendering

ADOP: Approximate Differentiable One-Pixel Point Rendering Abstract: We present a novel point-based, differentiable neural rendering pipeline for scen

Darius Rückert 1.9k Jan 06, 2023
PyTorch implementation of SQN based on CloserLook3D's encoder

SQN_pytorch This repo is an implementation of Semantic Query Network (SQN) using CloserLook3D's encoder in Pytorch. For TensorFlow implementation, che

PointCloudYC 1 Oct 21, 2021
A paper using optimal transport to solve the graph matching problem.

GOAT A paper using optimal transport to solve the graph matching problem. https://arxiv.org/abs/2111.05366 Repo structure .github: Files specifying ho

neurodata 8 Jan 04, 2023
Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

Fourier Feature Networks and Neural Volume Rendering This repository is a companion to a lecture given at the University of Cambridge Engineering Depa

Matthew A Johnson 133 Dec 26, 2022
FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows.

Meta Incubator 272 Jan 02, 2023
Codes for the compilation and visualization examples to the HIF vegetation dataset

High-impedance vegetation fault dataset This repository contains the codes that compile the "Vegetation Conduction Ignition Test Report" data, which a

1 Dec 12, 2021
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 03, 2023
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

Wu Huikai 402 Dec 27, 2022
Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling Code for the paper: Greg Ver Steeg and Aram Galstyan. "Hamiltonian Dynamics with N

Greg Ver Steeg 25 Mar 14, 2022
Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

David Brüggemann 35 Dec 05, 2022
Revisiting Self-Training for Few-Shot Learning of Language Model.

SFLM This is the implementation of the paper Revisiting Self-Training for Few-Shot Learning of Language Model. SFLM is short for self-training for few

15 Nov 19, 2022
CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

Myeongjun Kim 52 Jan 07, 2023
Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification (ACDNE) This is a pytorch implementation of the Adv

陈志豪 8 Oct 13, 2022