Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Last update: Nov 09, 2021

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

This project is the Project 1 of EPFL CS-433 Machine Learning. The project is the same as the Higgs Boson Machine Learning Challenge posted on Kaggle. The dataset and the detailed description can also be found in the GitHub repository of the course.

Team name: Novae

Team members: Giacomo Orsi, Vittorio Rossi, Chun-Tso Tsai

About the Project

The task of this project is to train a model based on the provided train.csv to have the best prediction on the data given in test.csv or any other general case.

We built our model for the problem using regularized linear regression after applying some data cleaning and features engineering techniques. A report describing our approach and our results can be found in the file report.pdf. In the end, we obtained an accuracy of 0.836 and an F1 score of 0.751 on the test.csv dataset.

Instructions

The project runs under Python 3.8 and requires NumPy=1.19.
Please make sure to place train.csv and test.csv inside the data folder. Those files can be downloaded here.
Go to the script/ folder and execute run.py. A model will be trained with the given hyper-parameters and predictions for the test dataset will be outputed in the file out.csv.

Modules

`implementations.py`

Contains the implementations of different learning algorithms. Including

Least squares linear regression
- least_squares: Direct computation from linear equations.
- least_squares_GD: Gradient descent.
- least_squares_SGD: Stochastic gradient descent.
- ridge_regression: Regularized linear regression from direct computation.
Logistic regression
- logistic_regression: Gradient descent
- reg_logistic_regression: Gradient descent with regularization.

There are also some helper functions in this file to facilitate the above functions.

`data_processing.py`

Calls the following files to process the data.

data_cleaning.py: Contains functions used to
1. Categorize data into subgroups.
2. Replace missing values with the median.
3. Standardize the features.
feature_engineering.py: Contains functions used to generate our interpretable features.

`run.py`

Generates the submission .csv file based on the data of test.csv stored in the folder data/. Our optimized model is also defined in this file.

Some helper Functions

models.py: Create the models for predicting the labels for new data points without true labels.
expansions.py: Contains a function to apply polynomial expansion to our features to add extra degrees of freedom for our models.
proj1_helpers.py: Contains functions which loads the .csv files as training or testing data, and create the .csv file for submission.
cross_validation.py: Contains a function to build the index for k-fold cross_validation.
disk_helper.py: Save/load the NumPy array to disk for further usage. Useful for saving hyper-parameters when trying a long training process.

Notebook

It is possible to use the Jupyter notebook project_notebook.ipynb located in the scripts folder to train the best hyper-parameters for the model. In the notebook it is possible to cross-validate a logistic and a least square regression model over given lambdas and degrees.

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Related tags

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

About the Project

Instructions

Modules

`implementations.py`

`data_processing.py`

`run.py`

Some helper Functions

Notebook

Owner

Giacomo Orsi

An implementation of the BADGE batch active learning algorithm.

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Semantic Image Synthesis with SPADE

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Multi-Stage Episodic Control for Strategic Exploration in Text Games

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The fastai deep learning library

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

Scene-Text-Detection-and-Recognition (Pytorch)

机器学习、深度学习、自然语言处理等人工智能基础知识总结。

Magic tool for managing internet connection in local network by @zalexdev

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss （ATVGnet）

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.

On the model-based stochastic value gradient for continuous reinforcement learning

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Related tags

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

About the Project

Instructions

Modules

implementations.py

data_processing.py

run.py

Some helper Functions

Notebook

Owner

Giacomo Orsi

An implementation of the BADGE batch active learning algorithm.

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Semantic Image Synthesis with SPADE

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Multi-Stage Episodic Control for Strategic Exploration in Text Games

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The fastai deep learning library

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

Scene-Text-Detection-and-Recognition (Pytorch)

机器学习、深度学习、自然语言处理等人工智能基础知识总结。

Magic tool for managing internet connection in local network by @zalexdev

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss （ATVGnet）

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.

On the model-based stochastic value gradient for continuous reinforcement learning

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

`implementations.py`

`data_processing.py`

`run.py`

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.