Tutorial on scikit-learn and IPython for parallel machine learning

Last update: Dec 26, 2022

Related tags

Overview

Parallel Machine Learning with scikit-learn and IPython

Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearranged in part and extended. Look at the title of the of the notebooks to be able to follow along the presentation.

Browse the static notebooks on nbviewer.ipython.org.

Scope of this tutorial:

Learn common machine learning concepts and how they match the scikit-learn Estimator API.
Learn about scalable feature extraction for text classification and clustering
Learn how to perform parallel cross validation and hyper parameters grid search in parallel with IPython.
Learn to analyze the kinds of common errors predictive models are subject to and how to refine your modeling to take this analysis into account.
Learn to optimize memory allocation on your computing nodes with numpy memory mapping features.
Learn how to run a cheap IPython cluster for interactive predictive modeling on the Amazon EC2 spot instances using StarCluster.

Target audience

This tutorial targets developers with some experience with scikit-learn and machine learning concepts in general.

It is recommended to first go through one of the tutorials hosted at scikit-learn.org if you are new to scikit-learn.

You might might also want to have a look at SciPy Lecture Notes first if you are new to the NumPy / SciPy / matplotlib ecosystem.

Setup

Install NumPy, SciPy, matplotlib, IPython, psutil, and scikit-learn in their latest stable version (e.g. IPython 2.2.0 and scikit-learn 0.15.2 at the time of writing).

You can find up to date installation instructions on scikit-learn.org and ipython.org .

To check your installation, launch the ipython interactive shell in a console and type the following import statements to check each library:

>>> import numpy
>>> import scipy
>>> import matplotlib
>>> import psutil
>>> import sklearn

If you don't get any message, everything is fine. If you get an error message, please ask for help on the mailing list of the matching project and don't forget to mention the version of the library you are trying to install along with the type of platform and version (e.g. Windows 8.1, Ubuntu 14.04, OSX 10.9...).

You can exit the ipython shell by typing exit.

Fetching the data

It is recommended to fetch the datasets ahead of time before diving into the tutorial material itself. To do so run the fetch_data.py script in this folder:

python fetch_data.py

Using the IPython notebook to follow the tutorial

The tutorial material and exercises are hosted in a set of IPython executable notebook files.

To run them interactively do:

$ cd notebooks
$ ipython notebook

This should automatically open a new browser window listing all the notebooks of the folder.

You can then execute the cell in order by hitting the "Shift-Enter" keys and watch the output display directly under the cell and the cursor move on to the next cell. Go to the "Help" menu for links to the notebook tutorial.

Credits

Some of this material is adapted from the scipy 2013 tutorial:

http://github.com/jakevdp/sklearn_scipy2013

Original authors:

Gael Varoquaux @GaelVaroquaux | http://gael-varoquaux.info
Jake VanderPlas @jakevdp | http://jakevdp.github.com
Olivier Grisel @ogrisel | http://ogrisel.com

Tutorial on scikit-learn and IPython for parallel machine learning

Related tags

Overview

Parallel Machine Learning with scikit-learn and IPython

Scope of this tutorial:

Target audience

Setup

Fetching the data

Using the IPython notebook to follow the tutorial

Credits

Owner

Olivier Grisel

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

Probabilistic Programming and Statistical Inference in PyTorch

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Convert game ISO and archives to CD CHD for emulation on Linux.

Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

Simple-System-Convert--C--F - Simple System Convert With Python

Neural Fixed-Point Acceleration for Convex Optimization

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

Python based Advanced AI Assistant

OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp.

A TensorFlow implementation of the Mnemonic Descent Method.

🛰️ List of earth observation companies and job sites

PConv-Keras - Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).