MiniSom is a minimalistic implementation of the Self Organizing Maps

Overview

MiniSom

Self Organizing Maps

MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details.

Updates about MiniSom are posted on Twitter.

Installation

Just use pip:

pip install minisom

or download MiniSom to a directory of your choice and use the setup script:

git clone https://github.com/JustGlowing/minisom.git
python setup.py install

How to use it

In order to use MiniSom you need your data organized as a Numpy matrix where each row corresponds to an observation or as list of lists like the following:

data = [[ 0.80,  0.55,  0.22,  0.03],
        [ 0.82,  0.50,  0.23,  0.03],
        [ 0.80,  0.54,  0.22,  0.03],
        [ 0.80,  0.53,  0.26,  0.03],
        [ 0.79,  0.56,  0.22,  0.03],
        [ 0.75,  0.60,  0.25,  0.03],
        [ 0.77,  0.59,  0.22,  0.03]]      

Then you can train MiniSom just as follows:

from minisom import MiniSom    
som = MiniSom(6, 6, 4, sigma=0.3, learning_rate=0.5) # initialization of 6x6 SOM
som.train(data, 100) # trains the SOM with 100 iterations

You can obtain the position of the winning neuron on the map for a given sample as follows:

som.winner(data[0])

For an overview of all the features implemented in minisom you can browse the following examples: https://github.com/JustGlowing/minisom/tree/master/examples

Export a SOM and load it again

A model can be saved using pickle as follows

import pickle
som = MiniSom(7, 7, 4)

# ...train the som here

# saving the som in the file som.p
with open('som.p', 'wb') as outfile:
    pickle.dump(som, outfile)

and can be loaded as follows

with open('som.p', 'rb') as infile:
    som = pickle.load(infile)

Note that if a lambda function is used to define the decay factor MiniSom will not be pickable anymore.

Explore parameters

You can use this dashboard to explore the effect of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py

Examples

Here are some of the charts you'll see how to generate in the examples:

Seeds map Class assignment
Handwritteng digits mapping Hexagonal Topology som hexagonal toplogy
Color quantization Outliers detection

Other tutorials

How to cite MiniSom

@misc{vettigliminisom,
  title={MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map},
  author={Giuseppe Vettigli},
  year={2018},
  url={https://github.com/JustGlowing/minisom/},
}

Who uses Minisom?

Guidelines to contribute

  1. In the description of your Pull Request explain clearly what does it implements/fixes and your changes. Possibly give an example in the description of the PR. In cases that the PR is about a code speedup, report a reproducible example and quantify the speedup.
  2. Give your pull request a helpful title that summarises what your contribution does.
  3. Write unit tests for your code and make sure the existing tests are up to date. pytest can be used for this:
pytest minisom.py
  1. Make sure that there a no stylistic issues using pycodestyle:
pycodestyle minisom.py
  1. Make sure your code is properly commented and documented. Each public method needs to be documented as the existing ones.
Comments
  • Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    This pull request introduces the possibility to train the SOM so that learning_rate and sigma are only being decreased after each epoch. During one epoch the SOM is updated once per given input vector (=len(data) times) with constant learning_rate and sigma. This should lead to a greater independence between the order of the input vectors and the resulting SOM.

    In order to use this feature, one only has to use train_epochs() instead of train().

    learning_rate and sigma could (should?) technically be updated only once every epoch but in order to change as little code as possible those parameters are still updated every time update() gets called (but with constant paramters during one epoch). This could be 'optimised' if desired.

    opened by jriege555 22
  • Fixed topographic_error() and quantization_error()

    Fixed topographic_error() and quantization_error()

    Problems:

    • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
    • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

    Fixes:

    • Fixed incorrect implementation of topographic_error() method.
    • Changed the topographic_error() and quantization_error() methods with vectorized implementation.
    opened by wei-zhang-thz 17
  • quantization error (theoretical question)

    quantization error (theoretical question)

    I have a question about the interpretability of the quantization error.

    How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

    For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

    question 
    opened by lachhebo 13
  • Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Hey :-)

    First of all thank you for providing this tool, it seems very handy! I am using SOM with geopotential height anomalies over a given region as input variables to cluster meteorological circulation patterns (ca. 2000 observations). What is really strange is that the SOM nodes differ completely when I rerun the same setup with more iterations (e.g. doubling from 10000 to 20000). It produces nodes not only in a different order, but also such that have no analogue in the new SOM... Is there anything I am doing wrong?

    Thank you very much - below some details about the setup

    The example I am using most often is sigma=1 (Gaussian), lr=0.5, SOM sizes between 2x4 to 4x5. The problem occurs no matter the initialization (pca or random) and no matter the training (single, batch, random). My code is basically only:

    SOM

    som = MiniSom(som_m, som_n, ndims, sigma=sigma, learning_rate=lr, neighborhood_function='gaussian') som.pca_weights_init(somarr) som.train_batch(somarr,10000,verbose=True)

    ...

    plot

    for m in range(som_m): for n in range(som_n): ax... pltarr = som.get_weights()[m,n,:].reshape((nsomlat,nsomlon)) p = ax.contourf(somlons,somlats,pltarr,cmap='seismic', transform=ccrs.PlateCarree())

    question 
    opened by michel039 12
  • Vectorized the _activate function

    Vectorized the _activate function

    Great library, but I noticed that the training code for your SOMs is not vectorized. You use the fast_norm function a lot, which may be faster than linalg.norm for 1D arrays, but iterating over every spot in the SOM is a lot slower than just calling linalg.norm.

    This pull request replaces fast_norm with linalg.norm in 2 places where I saw iteration over the whole SOM. Some simple testing with a 100x100 SOM showed ~40x speedup on my laptop.

    After making the changes, the unit tests failed, which I believe is caused by incorrectly setting up the testing weights as a 2D array rather than a 3D array. So I changed that too, and now the unit tests pass. I also did a few rough tests of my own, and the results of self.winner(x) and the training seem to be the same as before.

    opened by AustinT 11
  • Time Series

    Time Series

    Hello! I am trying to use my time series data for the example uploaded, but I encounter this error when initializing pca. Also, the second image is the error that I encounter when I use random initialization.

    image image

    opened by jaybhiesantos 10
  • How to cluster images?

    How to cluster images?

    I would like to know how to cluster images instead of reading CSV I want to read all images from disk and cluster those images using SOM.

    Can you please share some examples?

    opened by balavenkatesh3322 10
  • Example: Hexagonal Topology bokeh

    Example: Hexagonal Topology bokeh

    Summary

    This branch actions on https://github.com/JustGlowing/minisom/issues/86 by adding to the existing examples/HexagonalTopology.ipynb notebook an interactive bokeh example of the equivalent matplotlib plot.

    The purpose of adding interactivity was so that further exploration could be conducted on the plot to see where the original data points are mapped to in the SOM space.

    Check

    • [x] This branch adds value to the main repository, so it is worthwhile to include.
    • [x] The bokeh plot is equivalent to the matplotlib plot.
    • [ ] The code is error free and works on your machine.
    • [x] The logic of showing data points in the hover tooltip is sound.

    Note

    This "closes #86".

    opened by avisionh 10
  • speed up in update method

    speed up in update method

    Hi! Thanks for sharing the library! I noticed that if you change the loop in the update method with an einsum operation you can speed up the training by some amount. Hope you find it useful. Christos

    opened by Sourmpis 10
  • Add topographic error calculation for hexagonal grid

    Add topographic error calculation for hexagonal grid

    This PR adds the functionality for Topographic Error calculation, computed by finding the first-best-matching and second-best-matching neurons in the hexagonal grid.

    Screenshot 2022-04-12 005139

    The topographic error calculation is based on the above equation, which considers if the first-best-matching and second-best-matching neurons are neighbors in the SOM grid.

    opened by TharindaDilshan 9
  • new visualizations

    new visualizations

    Hi, I have implemented a number of visualizations in the BasicUsage file. Addionally, I did some minor changes (mainly typos) in some other files. As this is my first use of github, I do not know how to separate both topics and make two pull requests... I hope this works out!

    opened by bijae 9
  • Topographic error wrong for hexagonal topography with rectangular grid

    Topographic error wrong for hexagonal topography with rectangular grid

    Hi,

    I am trying to get the topographic error from a SOM with 11x7 neurons, hexagonal topography.

    When I do, I get this error:

         21     return (-1, -1)
         22 y = som._weights.shape[1]
    ---> 23 coords = som.convert_map_to_euclidean((index % y, int(index/y)))
         24 return coords
    
    File ~/.local/lib/python3.8/site-packages/minisom.py:243, in MiniSom.convert_map_to_euclidean(self, xy)
        237 def convert_map_to_euclidean(self, xy):
        238     """Converts map coordinates into euclidean coordinates
        239     that reflects the chosen topology.
        240 
        241     Only useful if the topology chosen is not rectangular.
        242     """
    --> 243     return self._xx.T[xy], self._yy.T[xy]
    
    IndexError: index 8 is out of bounds for axis 1 with size 7
    

    I don't think this line of code makes sense:

    coords = som.convert_map_to_euclidean((index % y, int(index/y)))

    Shouldn't the parameters be inverted, e.g.:

    coords = som.convert_map_to_euclidean((int(index/y), index % y))

    Anyway, thanks for the amazing work!

    bug 
    opened by mbarison 6
  • Matching Matlab hyperparameters

    Matching Matlab hyperparameters

    Hi there!Thank you for this great work!

    I switched to using python from the Matlab, version of SOM However I found the result was quite different. Where I could have a perfect 100% in MatLab but somehow only get 19% in f1-score here.

    The only thing I changed from the default setting in Matlab is using a 10*10. som = MiniSom(10, 10, 4096, sigma=1.5, learning_rate=0.7,activation_distance='euclidean', neighborhood_function='gaussian', topology='hexagonal', random_seed=10) And this is what I had for my settings using minisom.

    Any suggestions so I could maybe recreate the result from Matlab?

    Thank you in advance!

    question 
    opened by AmousQiu 3
  • Is there a way to obtain a distance of each point to its BMU?

    Is there a way to obtain a distance of each point to its BMU?

    Hi, first and foremost thank you for your great work and allowing to implement SOM algorithm in such convienent way. I wanted to ask if there is a possibility to obtain a kind of list with the distances between each point and its Best Matching Unit (Node) on trained SOM grid? I have read the documentation and saw different attributes for the SOM object, however it appears to me that none of them allow to return the (euclidean) distance to BMU. Thanks in advance for support!

    question 
    opened by JMiklaszewski 1
  • Is there an option to obtain the BMU value directly?

    Is there an option to obtain the BMU value directly?

    Hi there,

    I am trying to use BMU values a metric to classify my data. Features are seismic attributes. Your function “distance_from_weights” was my first guess but it´s not exporting BMUS directly. We do have to manipulate it to remove the second BMU.

    np.argsort(distance_from_weights(data), axis=1)[:, :2] -----> np.argsort(distance_from_weights(data), axis=1)[:, :1]

    Do you mind to build that function?

    question 
    opened by akol67 1
  • Wrong value in topographic error function?

    Wrong value in topographic error function?

    So a topographic error occurs when the two bmu of a sample are not adjacent. Shouldn't then t = 1? If the bmu are two hops apart in a corner, their euclidean distance is sqrt(2) = 1.4142 . So with distance > 1.42 this doesn't count as an error. Or am I missing something?

    question 
    opened by SandroMartens 0
  • Example spatio-temporal climate data

    Example spatio-temporal climate data

    This pull request is to load a SOM example on climate data notebook, which is usually 2D (time, lat, lon).

    I've been looking a lot into SOM examples, and it's hard to find examples on climate data...so I hope this notebook can help future users (and also me, if you find something wrong on the use).

    For the example, I've used the tutorial dataset from Xarray.

    opened by carocamargo 2
Releases(2.3.0)
Owner
Giuseppe Vettigli
Data Scientist, teaching fellow, Python enthusiast, fearless visionarist, lateral thinker.
Giuseppe Vettigli
3D cascade RCNN for object detection on point cloud

3D Cascade RCNN This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds. We designed a 3D object detection model

Qi Cai 22 Dec 02, 2022
Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"

BEGAN in Tensorflow Tensorflow implementation of BEGAN: Boundary Equilibrium Generative Adversarial Networks. Requirements Python 2.7 or 3.x Pillow tq

Taehoon Kim 922 Dec 21, 2022
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

PDVC Official implementation for End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) [paper] [valse论文速递(Chinese)] This repo supports:

Teng Wang 118 Dec 16, 2022
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

124 Jan 06, 2023
An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介 通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型,分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022
Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Deep learning based state estimation: incorporating Transformer and LSTM to Kalman Filter with EM algorithm Overview Kalman Filter requires the true p

zshicode 57 Dec 27, 2022
PyMove is a Python library to simplify queries and visualization of trajectories and other spatial-temporal data

Use PyMove and go much further Information Package Status License Python Version Platforms Build Status PyPi version PyPi Downloads Conda version Cond

Insight Data Science Lab 64 Nov 15, 2022
The implementation of the lifelong infinite mixture model

Lifelong infinite mixture model 📋 This is the implementation of the Lifelong infinite mixture model 📋 Accepted by ICCV 2021 Title : Lifelong Infinit

Fei Ye 5 Oct 20, 2022
The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

A Deep Feature Aggregation Network for Accurate Indoor Camera Localization This is the PyTorch implementation of our paper "A Deep Feature Aggregation

9 Dec 09, 2022
Prompts - Read a textfile of prompts and import into anki via ankiconnect

prompts read a textfile of prompts and import into anki via ankiconnect Usage In

Alexander Cobleigh 2 Jul 28, 2022
Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition Project Page | Video | Paper Implementation for Neural-PIL. A novel method wh

Computergraphics (University of Tübingen) 64 Dec 29, 2022
HyperPose is a library for building high-performance custom pose estimation applications.

HyperPose is a library for building high-performance custom pose estimation applications.

TensorLayer Community 1.2k Jan 04, 2023
This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Underwater Light Field Retention : Neural Rendering for Underwater Imaging (UWNR) (Accepted by CVPR Workshop2022 NTIRE) Authors: Tian Ye†, Sixiang Che

jmucsx 17 Dec 14, 2022
Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

SGMNet Implementation PyTorch implementation of SGMNet for ICCV'21 paper "Learning to Match Features with Seeded Graph Matching Network", by Hongkai C

87 Dec 11, 2022
SOFT: Softmax-free Transformer with Linear Complexity, NeurIPS 2021 Spotlight

SOFT: Softmax-free Transformer with Linear Complexity SOFT: Softmax-free Transformer with Linear Complexity, Jiachen Lu, Jinghan Yao, Junge Zhang, Xia

Fudan Zhang Vision Group 272 Dec 25, 2022
Little tool in python to watch anime from the terminal (the better way to watch anime)

ani-cli Script working again :), thanks to the fork by Dink4n for the alternative approach to by pass the captcha on gogoanime A cli to browse and wat

Harshith 4.5k Dec 31, 2022
Code for `BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery`, Neurips 2021

This folder contains the code for 'Scalable Variational Approaches for Bayesian Causal Discovery'. Installation To install, use conda with conda env c

14 Sep 21, 2022
Listing arxiv - Personalized list of today's articles from ArXiv

Personalized list of today's articles from ArXiv Print and/or send to your gmail

Lilianne Nakazono 5 Jun 17, 2022
EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

EfficientDet This is an implementation of EfficientDet for object detection on Keras and Tensorflow. The project is based on the official implementati

1.3k Dec 19, 2022
For visualizing the dair-v2x-i dataset

3D Detection & Tracking Viewer The project is based on hailanyi/3D-Detection-Tracking-Viewer and is modified, you can find the original version of the

34 Dec 29, 2022