Maximum Covariance Analysis in Python

Overview

xMCA | Maximum Covariance Analysis in Python

version GitHub Workflow Status Documentation Status codecov.io downloads DOI

The aim of this package is to provide a flexible tool for the climate science community to perform Maximum Covariance Analysis (MCA) in a simple and consistent way. Given the huge popularity of xarray in the climate science community, xmca supports xarray.DataArray as well as numpy.ndarray as input formats.

Example Figure Mode 2 of complex rotated Maximum Covariance Analysis showing the shared dynamics of SST and continental precipitation associated to ENSO between 1980 and 2020.

🔰 What is MCA?

MCA maximises the temporal covariance between two different data fields and is closely related to Principal Component Analysis (PCA) / Empirical Orthogonal Function analysis (EOF analysis). While EOF analysis maximises the variance within a single data field, MCA allows to extract the dominant co-varying patterns between two different data fields. When the two input fields are the same, MCA reduces to standard EOF analysis.

For the mathematical understanding please have a look at e.g. Bretherton et al. or the lecture material written by C. Bretherton.

New in release 1.4.x

  • Much faster and more memory-efficient algorithm
  • Significance testing of individual modes via
  • Period parameter of solve method provides more flexibility to exponential extension, making complex MCA more stable
  • Fixed missing coslat weighting when saving a model (Issue 25)

📌 Core Features

Standard Rotated Complex Complex Rotated
EOF analysis ✔️ ✔️ ✔️ ✔️
MCA ✔️ ✔️ ✔️ ✔️

* click on check marks for reference
** Complex rotated MCA is also available as a pre-print on arXiv.

🔧 Installation

Installation is simply done via

pip install xmca

If you have problems during the installation please consult the documentation or raise an issue here on Github.

📰 Documentation

A tutorial to get you started as well as the full API can be found in the documentation.

Quickstart

Import the package

    from xmca.array import MCA  # use with np.ndarray
    from xmca.xarray import xMCA  # use with xr.DataArray

As an example, we take North American surface temperatures shipped with xarray. Note: only works with xr.DataArray, not xr.Dataset.

    import xarray as xr  # only needed to obtain test data

    # split data arbitrarily into west and east coast
    data = xr.tutorial.open_dataset('air_temperature').air
    west = data.sel(lon=slice(200, 260))
    east = data.sel(lon=slice(260, 360))

PCA / EOF analysis

Construct a model with only one field and solve it to perform standard PCA / EOF analysis.

    pca = xMCA(west)                        # PCA of west coast
    pca.solve(complexify=False)            # True for complex PCA

    svals = pca.singular_values()     # singular vales = eigenvalues for PCA
    expvar      = pca.explained_variance()  # explained variance
    pcs         = pca.pcs()                 # Principal component scores (PCs)
    eofs        = pca.eofs()                # spatial patterns (EOFs)

Obtaining a Varimax/Promax-rotated solution can be achieved by rotating the model choosing the number of EOFs to be rotated (n_rot) as well as the Promax parameter (power). Here, power=1 equals a Varimax-rotated solution.

    pca.rotate(n_rot=10, power=1)

    expvar_rot  = pca.explained_variance()  # explained variance
    pcs_rot     = pca.pcs()                 # Principal component scores (PCs)
    eofs_rot    = pca.eofs()                # spatial patterns (EOFs)

MCA

Same as for PCA / EOF analysis, but with two input fields instead of one.

    mca = xMCA(west, east)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA

    eigenvalues = mca.singular_values()     # singular vales
    pcs = mca.pcs()                         # expansion coefficient (PCs)
    eofs = mca.eofs()                       # spatial patterns (EOFs)

Significance analysis

A simple way of estimating the significance of the obtained modes is by running Monte Carlo simulations based on uncorrelated Gaussian white noise known as Rule N (Overland and Preisendorfer 1982). Here we create 200 of such synthetic data sets and compare the synthetic with the real singular spectrum to assess significance.

    surr = mca.rule_n(200)
    median = surr.median('run')
    q99 = surr.quantile(.99, dim='run')
    q01 = surr.quantile(.01, dim='run')

    cutoff = np.sum((svals - q99 > 0)).values  # first 8 modes significant

    fig = plt.figure(figsize=(10, 4))
    ax = fig.add_subplot(111)
    svals.plot(ax=ax, yscale='log', label='true')
    median.plot(ax=ax, yscale='log', color='.5', label='rule N')
    q99.plot(ax=ax, yscale='log', color='.5', ls=':')
    q01.plot(ax=ax, yscale='log', color='.5', ls=':')
    ax.axvline(cutoff + 0.5, ls=':')
    ax.set_xlim(-2, 200)
    ax.set_ylim(1e-1, 2.5e4)
    ax.set_title('Significance based on Rule N')
    ax.legend()

Example Figure Mode1 The first 8 modes are significant according to rule N using 200 synthetic runs.

Saving/loading an analysis

    mca.save_analysis('my_analysis')    # this will save the data and a respective
                                        # info file. The files will be stored in a
                                        # special directory
    mca2 = xMCA()                       # create a new, empty instance
    mca2.load_analysis('my_analysis/info.xmca') # analysis can be
                                        # loaded via specifying the path to the
                                        # info file created earlier

Quickly inspect your results visually

The package provides a method to plot individual modes.

    mca2.set_field_names('West', 'East')
    pkwargs = {'orientation' : 'vertical'}
    mca2.plot(mode=1, **pkwargs)

Example Figure Mode1 Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

You may want to modify the plot for some better optics:

    from cartopy.crs import EqualEarth  # for different map projections

    # map projections for "left" and "right" field
    projections = {
        'left': EqualEarth(),
        'right': EqualEarth()
    }

    pkwargs = {
        "figsize"     : (8, 5),
        "orientation" : 'vertical',
        'cmap_eof'    : 'BrBG',  # colormap amplitude
        "projection"  : projections,
    }
    mca2.plot(mode=3, **pkwargs)

Example Figure Mode 3

You can save the plot to your local disk as a .png file via

    skwargs={'dpi':200}
    mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

🔖 Please cite

I am just starting my career as a scientist. Feedback on my scientific work is therefore important to me in order to assess which of my work advances the scientific community. As such, if you use the package for your own research and find it helpful, I would appreciate feedback here on Github, via email, or as a citation:

Niclas Rieger, 2021: nicrie/xmca: version x.y.z. doi:10.5281/zenodo.4749830.

💪 Credits

Kudos to the developers and contributors of the following Github projects which I initially used myself and used as an inspiration:

And of course credits to the developers of the extremely useful packages

Comments
  • SVD did not converge

    SVD did not converge

    Hi Niclas,

    The XMCA worked fine when I used it directly on my raw data. As it produced results of what I was expecting. However, I tried to use processed data (like anomalies and detrend) it gives the following error - SVG didn't converge

    image image

    Does XMCA only accept raw data or is something wrong with my i/p? This is how my data looks: image

    Even the previously worked data was in a similar format. What could be the issue?

    opened by GIRIJA-KALYANI 3
  • cartopy dependency is too restrictive

    cartopy dependency is too restrictive

    The very restrictive cartopy dependency makes it tricky to install into an existing conda environment

    cartopy==0.18.0
    

    I can see it was changed at https://github.com/coecms/xmca/commit/896e0b5977c4f4a36ed01363141f3ab7dd24c6d5

    When I changed it back to >=18.0 it installed fine using pip install --user and imported fine with cartopy-0.19.0.post1.

    I ran the tests like so

    python -m unittest discover -v -s tests/
    

    but three of the tests didn't pass

    test_save_load_cplx (integration.test_integration_xarray.TestIntegration) ... ERROR    
    test_save_load_rot (integration.test_integration_xarray.TestIntegration) ... ERROR                                       
    test_save_load_std (integration.test_integration_xarray.TestIntegration) ... ERROR    
    

    Some other error messages:

    Error: Rotation process did not converge!
    
    ======================================================================                                                   
    ERROR: test_save_load_cplx (integration.test_integration_xarray.TestIntegration)                                         
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_rot (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_std (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ---------------------------------------------------------------------- 
    

    but it wasn't obvious that this was an issue with the version of cartopy.

    opened by aidanheerdegen 3
  • installation error

    installation error

    Hello Niclas,

    I installed xmca but when I import the library I get this error: libmkl_rt.so: cannot open shared object file: No such file or directory

    any ideas or suggestions would be greatly appreciated.

    Cheers,

    Charles.

    opened by chjones2 3
  • May cause errors after using mca. normalize()

    May cause errors after using mca. normalize()

    Hello, first of all, thank you for your contribution. However, I found a bug that may cause SVD to fail to calculate (due to the existence of NaN value). The details are as follows:

    when MCA class is initialized, it will be called get_nan_cols and remove_ nan_ cols to remove the NaN value, but if you call mca.normalize() at this time. New NaN values can appear and be brought into SVD calculation. This is because if the value of each time step of a grid point in the input array is the same (For example, it never rains in a place), the value obtained after standardization is NaN, which causes SVD unable to solve problem.

    opened by L1angY 2
  • Feedback

    Feedback

    I'm posting my feedback on behalf of Alex vM, who asked me to have a look at this package. The feedback is subjective and you may disagree with some if not all suggestions below.

    1. In general, the readme is well written. Clear and concise. I'd add a figure(s) though. For instance, after you call mca.plot() in your example.
    2. Currently, this package is for developers or at least those who have python experience. Well, maybe covariance estimation requires knowledge of python and programming skills, but I recommend making sphinx documentation (and publishing it on readthedocs) to bring users. You've already written docs for each if not all functions and classes. The only step left is to wrap it in sphinx with default parameters and paths.
    3. To give credits to your package and show that it's well maintained, I recommend adding badges (readthedocs, travis build and test coverage). Use CircleCI. Here is a config example (you need only build-pip job since it's the easiest).
    4. tools folder must be in the xmca folder.
    5. setup.py install requires must be read from requirements.txt file (example) and not hard-coded.
    6. GPL-3 license is too restrictive. Consider BSD-3 or even MIT.
    7. Each test that involves randomness must start with a numpy.random.seed. Currently, you're setting the seed globally. It's not a good idea because the test results depend on the test order, which, of course, should not happen.

    Good luck!

    Best, Danylo

    bug documentation 
    opened by dizcza 1
  • save_analysis currently does not save cos lat weighting

    save_analysis currently does not save cos lat weighting

    Just stumbled over this:

    When saving a model via xmca.xarray.save_analysis and cosine latitude weighting was applied (apply_coslat), the current implementation does not invoke xmca.xarray.apply_coslat when the model gets loaded via xmca.xarray.load_analysis thus creating false PCs.

    I hope to provide a fix to this soon.

    bug 
    opened by nicrie 0
  • Release 1.0.0

    Release 1.0.0

    New in release 1.0.0

    • method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)
    • more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.
    • add method to summarize performed analysis (summary)
    • add method to return input fields
    • improve docs (resolves #7)
    • correct and consistent use of definition of loadings
    • some bugfixes (e.g. resolves #12 )
    opened by nicrie 0
  • MCA errors

    MCA errors

    Hello, I am trying to run the MCA with two variables, which are a climate model, WRF's output.

    I get this error right after the bit:

    mca.plot(mode=1, **pkwargs) : 
    

    ValueError: coordinate lon has dimensions ('south_north', 'west_east'), but these are not a subset of the DataArray dimensions ['lat', 'lon', 'mode']

    Would really appreciate any help with this error. Many thanks.

    # Load packages and data:
    import xarray as xr
    import matplotlib.pyplot as plt
    import cartopy
    
    var=xr.open_dataset("F:\\era5_2000_2020_vars_salem.nc")
    
    t2=var.T2C
    snow=var.SNOWH
    
    #The variables, e.g., t2 is  structured as follows: 
    t2:
    <xarray.DataArray 'T2C' (time: 3512, south_north: 111, west_east: 114)>
    Coordinates:
        lat          (south_north, west_east) float32 ...
        lon          (south_north, west_east) float32 ...
        xtime        (time) datetime64[ns] ...
      * time         (time) datetime64[ns] 2000-11-01 2000-11-02 ... 2020-04-29
      * west_east    (west_east) float64 -2.766e+05 -2.666e+05 ... 8.534e+05
      * south_north  (south_north) float64 -1.353e+05 -1.253e+05 ... 9.647e+05
    Attributes:
        FieldType:    104
        MemoryOrder:  XY 
        description:  2m Temperature
        units:        C
        stagger:      
        pyproj_srs:   +proj=lcc +lat_0=64 +lon_0=10 +lat_1=64 +lat_2=68 +x_0=0 +y...
        coordinates:  XLONG XLAT XTIME
    
    mca = xMCA(t2, snow)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA
    
    
    eigenvalues = mca.singular_values()     
    pcs = mca.pcs()                           
    eofs = mca.eofs()   
    
    mca.set_field_names('t2','snow')
    pkwargs = {'orientation' : 'vertical'}
    mca.plot(mode=1, **pkwargs)
    
    opened by Murk89 23
  • Sourcery Starbot ⭐ refactored nicrie/xmca

    Sourcery Starbot ⭐ refactored nicrie/xmca

    Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

    Here's your pull request refactoring your most popular Python repo.

    If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

    Review changes via command line

    To manually merge these changes, make sure you're on the master branch, then run:

    git fetch https://github.com/sourcery-ai-bot/xmca master
    git merge --ff-only FETCH_HEAD
    git reset HEAD^
    
    opened by sourcery-ai-bot 0
  • multivariate EOF analysis / MCA

    multivariate EOF analysis / MCA

    add this feature in next release

    note: this will be probably be a major change since it requires to rewrite the internal structure of the package and therefore will break backwards version compatibility

    enhancement 
    opened by nicrie 0
Releases(1.4.2)
Owner
Niclas Rieger
Niclas Rieger
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Convert monolithic Jupyter notebooks into Ploomber pipelines.

Soorgeon Join our community | Newsletter | Contact us | Blog | Website | YouTube Convert monolithic Jupyter notebooks into Ploomber pipelines. soorgeo

Ploomber 65 Dec 16, 2022
A Python package for modular causal inference analysis and model evaluations

Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t

International Business Machines 506 Dec 19, 2022
An Aspiring Drop-In Replacement for NumPy at Scale

Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the f

Legate 502 Jan 03, 2023
Cleaning and analysing aggregated UK political polling data.

Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.

Ajay Pethani 0 Dec 22, 2021
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

Spectacular AI 94 Jan 04, 2023
An orchestration platform for the development, production, and observation of data assets.

Dagster An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data f

Dagster 6.2k Jan 08, 2023
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022
CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

C$50 Finance In this guide we want to implement a website via which users can “register”, “login” “buy” and “sell” stocks, like below: Background If y

1 Jan 24, 2022
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022
TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistic

150 Dec 30, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而

Lin Jian 3 Nov 16, 2021
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 04, 2021
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

208 Dec 29, 2022