A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Overview

collie_recs

PyPI version versions Workflows Passing Documentation Status codecov license

Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie dog breed.

Collie offers a collection of simple APIs for preparing and splitting datasets, incorporating item metadata directly into a model architecture or loss, efficiently evaluating a model's performance on the GPU, and so much more. Above all else though, Collie is built with flexibility and customization in mind, allowing for faster prototyping and experimentation.

See the documentation for more details.

"We adopted 2 Border Collies a year ago and they are about 3 years old. They are completely obsessed with fetch and tennis balls and it's getting out of hand. They live in the fenced back yard and when anyone goes out there they instantly run around frantically looking for a tennis ball. If there is no ball they will just keep looking and will not let you pet them. When you do have a ball, they are 100% focused on it and will not notice anything else going on around them, like it's their whole world."

-- A Reddit thread on r/DogTraining

Installation

pip install collie_recs

Quick Start

Open In Colab

Creating and evaluating an implicit matrix factorization model with MovieLens 100K data is simple with Collie:

from collie_recs.cross_validation import stratified_split
from collie_recs.interactions import Interactions
from collie_recs.metrics import auc, evaluate_in_batches, mapk, mrr
from collie_recs.model import MatrixFactorizationModel, CollieTrainer
from collie_recs.movielens import read_movielens_df
from collie_recs.utils import convert_to_implicit


# read in MovieLens 100K data
df = read_movielens_df()

# convert the data to implicit
df_imp = convert_to_implicit(df)

# store data as ``Interactions``
interactions = Interactions(users=df_imp['user_id'],
                            items=df_imp['item_id'],
                            allow_missing_ids=True)

# perform a data split
train, val = stratified_split(interactions)

# train an implicit ``MatrixFactorization`` model
model = MatrixFactorizationModel(train=train,
                                 val=val,
                                 embedding_dim=10,
                                 lr=1e-1,
                                 loss='adaptive',
                                 optimizer='adam')
trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.freeze()

# evaluate the model
auc_score, mrr_score, mapk_score = evaluate_in_batches([auc, mrr, mapk], val, model)

print(f'AUC:          {auc_score}')
print(f'MRR:          {mrr_score}')
print(f'[email protected]:       {mapk_score}')

More complicated examples of pipelines can be viewed for MovieLens 100K data here, in notebooks here, and documentation here.

Comparison With Other Open-Source Recommendation Libraries

On some smaller screens, you might have to scroll right to see the full table. ➡️

Aspect Included in Library Surprise LightFM FastAI Spotlight RecBole TensorFlow Recommenders Collie
Implicit data support for when we only know when a user interacts with an item or not, not the explicit rating the user gave the item
Explicit data support for when we know the explicit rating the user gave the item *
Support for side-data incorporated directly into the models
Support a flexible framework for new model architectures and experimentation
Deep learning libraries utilizing speed-ups with a GPU and able to implement new, cutting-edge deep learning algorithms
Automatic support for multi-GPU training
Actively supported and maintained
Type annotations for classes, methods, and functions
Scalable for larger, out-of-memory datasets
Includes model zoo with two or more model architectures implemented
Includes implicit loss functions for training and metric functions for model evaluation
Includes adaptive loss functions for multiple negative examples
Includes loss functions that account for side-data

* Coming soon!

The following table notes shows the results of an experiment training and evaluating recommendation models in some popular implicit recommendation model frameworks on a common MovieLens 10M dataset. The data was split via a 90/5/5 stratified data split. Each model was trained for a maximum of 40 epochs using an embedding dimension of 32. For each model, we used default hyperparameters (unless otherwise noted below).

Model [email protected] Score Notes
Randomly initialized, untrained model 0.0001
Logistic MF 0.0128 Using the CUDA implementation.
LightFM with BPR Loss 0.0180
ALS 0.0189 Using the CUDA implementation.
BPR 0.0301 Using the CUDA implementation.
Spotlight 0.0376 Using adaptive hinge loss.
LightFM with WARP Loss 0.0412
Collie MatrixFactorizationModel 0.0425 Using a separate SGD bias optimizer.

At ShopRunner, we have found Collie models outperform comparable LightFM models with up to 64% improved [email protected] scores.

Development

To run locally, begin by creating a data path environment variable:

# Define where on your local hard drive you want to store data. It is best if this
# location is not inside the repo itself. An example is below
export DATA_PATH=$HOME/data/collie_recs

Run development from within the Docker container:

docker build -t collie_recs .

# run the container in interactive mode, leaving port ``8888`` open for Jupyter
docker run \
    -it \
    --rm \
    -v "${DATA_PATH}:/data" \
    -v "${PWD}:/collie_recs" \
    -p 8888:8888 \
    collie_recs /bin/bash

Run on a GPU:

docker build -t collie_recs .

# run the container in interactive mode, leaving port ``8888`` open for Jupyter
docker run \
    -it \
    --rm \
    --gpus all \
    -v "${DATA_PATH}:/data" \
    -v "${PWD}:/collie_recs" \
    -p 8888:8888 \
    collie_recs /bin/bash

Start JupyterLab

To run JupyterLab, start the container and execute the following:

jupyter lab --ip 0.0.0.0 --no-browser --allow-root

Connect to JupyterLab here: http://localhost:8888/lab

Unit Tests

Library unit tests in this repo are to be run in the Docker container:

# execute unit tests
pytest --cov-report term --cov=collie_recs

Note that a handful of tests require the MovieLens 100K dataset to be downloaded (~5MB in size), meaning that either before or during test time, there will need to be an internet connection. This dataset only needs to be downloaded a single time for use in both unit tests and tutorials.

Docs

The Collie library supports Read the Docs documentation. To compile locally,

cd docs
make html

# open local docs
open build/html/index.html
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

2.3k Jan 09, 2023
PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

StarEnhancer StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement (ICCV 2021 Oral) Abstract: Image enhancement is a subjective process w

IDKiro 133 Dec 28, 2022
Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

PhyCRNet Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs Paper link: [ArXiv] By: Pu Ren, Chengping Rao, Yang

Pu Ren 11 Aug 23, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

24 Nov 02, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 04, 2023
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022
This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be sh

Jianquan Ye 82 Nov 17, 2022
GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs [Paper, Slides, Video Talk] at USENIX OSDI'21 @inproceedings{GNNAdvisor, title=

YUKE WANG 47 Jan 03, 2023
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022
Python implementation of the multistate Bennett acceptance ratio (MBAR)

pymbar Python implementation of the multistate Bennett acceptance ratio (MBAR) method for estimating expectations and free energy differences from equ

Chodera lab // Memorial Sloan Kettering Cancer Center 169 Dec 02, 2022
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Memory Compressed Attention Implementation of the Self-Attention layer of the proposed Memory-Compressed Attention, in Pytorch. This repository offers

Phil Wang 47 Dec 23, 2022
Advancing mathematics by guiding human intuition with AI

Advancing mathematics by guiding human intuition with AI This repo contains two colab notebooks which accompany the paper, available online at https:/

DeepMind 315 Dec 26, 2022
Github for the conference paper GLOD-Gaussian Likelihood OOD detector

FOOD - Fast OOD Detector Pytorch implamentation of the confernce peper FOOD arxiv link. Abstract Deep neural networks (DNNs) perform well at classifyi

17 Jun 19, 2022
Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

CNNs fruits360 Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class. CNN on a pretrained model Build a CNN on a pretrained model, Res

Ricky Chuang 1 Mar 07, 2022
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
Tiny Object Detection in Aerial Images.

AI-TOD AI-TOD is a dataset for tiny object detection in aerial images. [Paper] [Dataset] Description AI-TOD comes with 700,621 object instances for ei

jwwangchn 116 Dec 30, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

M1-tensorflow-benchmark TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1). I was initially testing if Tens

particle 2 Jan 05, 2022