4th place solution to datafactory challenge by Intermarché.

Last update: Mar 19, 2022

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date	store	item	quantity
2018-01-01	1	12	1
2018-01-01	1	17	2
2018-01-01	1	22	3

We have additional tables available such as:

Product characteristics.
Store characteristics.
Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

Aggregates by item / store (mean + std)
Aggregates on prices. (mean)
Aggregates on the characteristics of the stores. (mean)
Aggregates on product characteristics. (mean)
Rolling medians over the last 9 weeks.
Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

4th place solution to datafactory challenge by Intermarché.

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

Data 💿

Solution 🤖

Run ♻️

License

Owner

Raphael Sourty

Contrastive Loss Gradient Attack (CLGA)

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Split your patch similarly to `git add -p` but supporting multiple buckets

Super Resolution for images using deep learning.

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

A Python reference implementation of the CF data model

A package related to building quasi-fibration symmetries

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

A keras-based real-time model for medical image segmentation (CFPNet-M)

CLIP+FFT text-to-image

The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Fine-tuning StyleGAN2 for Cartoon Face Generation

Rethinking Transformer-based Set Prediction for Object Detection

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features