Maximum Covariance Analysis in Python

Overview

xMCA | Maximum Covariance Analysis in Python

version GitHub Workflow Status Documentation Status codecov.io downloads DOI

The aim of this package is to provide a flexible tool for the climate science community to perform Maximum Covariance Analysis (MCA) in a simple and consistent way. Given the huge popularity of xarray in the climate science community, xmca supports xarray.DataArray as well as numpy.ndarray as input formats.

Example Figure Mode 2 of complex rotated Maximum Covariance Analysis showing the shared dynamics of SST and continental precipitation associated to ENSO between 1980 and 2020.

🔰 What is MCA?

MCA maximises the temporal covariance between two different data fields and is closely related to Principal Component Analysis (PCA) / Empirical Orthogonal Function analysis (EOF analysis). While EOF analysis maximises the variance within a single data field, MCA allows to extract the dominant co-varying patterns between two different data fields. When the two input fields are the same, MCA reduces to standard EOF analysis.

For the mathematical understanding please have a look at e.g. Bretherton et al. or the lecture material written by C. Bretherton.

New in release 1.4.x

  • Much faster and more memory-efficient algorithm
  • Significance testing of individual modes via
  • Period parameter of solve method provides more flexibility to exponential extension, making complex MCA more stable
  • Fixed missing coslat weighting when saving a model (Issue 25)

📌 Core Features

Standard Rotated Complex Complex Rotated
EOF analysis ✔️ ✔️ ✔️ ✔️
MCA ✔️ ✔️ ✔️ ✔️

* click on check marks for reference
** Complex rotated MCA is also available as a pre-print on arXiv.

🔧 Installation

Installation is simply done via

pip install xmca

If you have problems during the installation please consult the documentation or raise an issue here on Github.

📰 Documentation

A tutorial to get you started as well as the full API can be found in the documentation.

Quickstart

Import the package

    from xmca.array import MCA  # use with np.ndarray
    from xmca.xarray import xMCA  # use with xr.DataArray

As an example, we take North American surface temperatures shipped with xarray. Note: only works with xr.DataArray, not xr.Dataset.

    import xarray as xr  # only needed to obtain test data

    # split data arbitrarily into west and east coast
    data = xr.tutorial.open_dataset('air_temperature').air
    west = data.sel(lon=slice(200, 260))
    east = data.sel(lon=slice(260, 360))

PCA / EOF analysis

Construct a model with only one field and solve it to perform standard PCA / EOF analysis.

    pca = xMCA(west)                        # PCA of west coast
    pca.solve(complexify=False)            # True for complex PCA

    svals = pca.singular_values()     # singular vales = eigenvalues for PCA
    expvar      = pca.explained_variance()  # explained variance
    pcs         = pca.pcs()                 # Principal component scores (PCs)
    eofs        = pca.eofs()                # spatial patterns (EOFs)

Obtaining a Varimax/Promax-rotated solution can be achieved by rotating the model choosing the number of EOFs to be rotated (n_rot) as well as the Promax parameter (power). Here, power=1 equals a Varimax-rotated solution.

    pca.rotate(n_rot=10, power=1)

    expvar_rot  = pca.explained_variance()  # explained variance
    pcs_rot     = pca.pcs()                 # Principal component scores (PCs)
    eofs_rot    = pca.eofs()                # spatial patterns (EOFs)

MCA

Same as for PCA / EOF analysis, but with two input fields instead of one.

    mca = xMCA(west, east)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA

    eigenvalues = mca.singular_values()     # singular vales
    pcs = mca.pcs()                         # expansion coefficient (PCs)
    eofs = mca.eofs()                       # spatial patterns (EOFs)

Significance analysis

A simple way of estimating the significance of the obtained modes is by running Monte Carlo simulations based on uncorrelated Gaussian white noise known as Rule N (Overland and Preisendorfer 1982). Here we create 200 of such synthetic data sets and compare the synthetic with the real singular spectrum to assess significance.

    surr = mca.rule_n(200)
    median = surr.median('run')
    q99 = surr.quantile(.99, dim='run')
    q01 = surr.quantile(.01, dim='run')

    cutoff = np.sum((svals - q99 > 0)).values  # first 8 modes significant

    fig = plt.figure(figsize=(10, 4))
    ax = fig.add_subplot(111)
    svals.plot(ax=ax, yscale='log', label='true')
    median.plot(ax=ax, yscale='log', color='.5', label='rule N')
    q99.plot(ax=ax, yscale='log', color='.5', ls=':')
    q01.plot(ax=ax, yscale='log', color='.5', ls=':')
    ax.axvline(cutoff + 0.5, ls=':')
    ax.set_xlim(-2, 200)
    ax.set_ylim(1e-1, 2.5e4)
    ax.set_title('Significance based on Rule N')
    ax.legend()

Example Figure Mode1 The first 8 modes are significant according to rule N using 200 synthetic runs.

Saving/loading an analysis

    mca.save_analysis('my_analysis')    # this will save the data and a respective
                                        # info file. The files will be stored in a
                                        # special directory
    mca2 = xMCA()                       # create a new, empty instance
    mca2.load_analysis('my_analysis/info.xmca') # analysis can be
                                        # loaded via specifying the path to the
                                        # info file created earlier

Quickly inspect your results visually

The package provides a method to plot individual modes.

    mca2.set_field_names('West', 'East')
    pkwargs = {'orientation' : 'vertical'}
    mca2.plot(mode=1, **pkwargs)

Example Figure Mode1 Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

You may want to modify the plot for some better optics:

    from cartopy.crs import EqualEarth  # for different map projections

    # map projections for "left" and "right" field
    projections = {
        'left': EqualEarth(),
        'right': EqualEarth()
    }

    pkwargs = {
        "figsize"     : (8, 5),
        "orientation" : 'vertical',
        'cmap_eof'    : 'BrBG',  # colormap amplitude
        "projection"  : projections,
    }
    mca2.plot(mode=3, **pkwargs)

Example Figure Mode 3

You can save the plot to your local disk as a .png file via

    skwargs={'dpi':200}
    mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

🔖 Please cite

I am just starting my career as a scientist. Feedback on my scientific work is therefore important to me in order to assess which of my work advances the scientific community. As such, if you use the package for your own research and find it helpful, I would appreciate feedback here on Github, via email, or as a citation:

Niclas Rieger, 2021: nicrie/xmca: version x.y.z. doi:10.5281/zenodo.4749830.

💪 Credits

Kudos to the developers and contributors of the following Github projects which I initially used myself and used as an inspiration:

And of course credits to the developers of the extremely useful packages

Comments
  • SVD did not converge

    SVD did not converge

    Hi Niclas,

    The XMCA worked fine when I used it directly on my raw data. As it produced results of what I was expecting. However, I tried to use processed data (like anomalies and detrend) it gives the following error - SVG didn't converge

    image image

    Does XMCA only accept raw data or is something wrong with my i/p? This is how my data looks: image

    Even the previously worked data was in a similar format. What could be the issue?

    opened by GIRIJA-KALYANI 3
  • cartopy dependency is too restrictive

    cartopy dependency is too restrictive

    The very restrictive cartopy dependency makes it tricky to install into an existing conda environment

    cartopy==0.18.0
    

    I can see it was changed at https://github.com/coecms/xmca/commit/896e0b5977c4f4a36ed01363141f3ab7dd24c6d5

    When I changed it back to >=18.0 it installed fine using pip install --user and imported fine with cartopy-0.19.0.post1.

    I ran the tests like so

    python -m unittest discover -v -s tests/
    

    but three of the tests didn't pass

    test_save_load_cplx (integration.test_integration_xarray.TestIntegration) ... ERROR    
    test_save_load_rot (integration.test_integration_xarray.TestIntegration) ... ERROR                                       
    test_save_load_std (integration.test_integration_xarray.TestIntegration) ... ERROR    
    

    Some other error messages:

    Error: Rotation process did not converge!
    
    ======================================================================                                                   
    ERROR: test_save_load_cplx (integration.test_integration_xarray.TestIntegration)                                         
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_rot (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_std (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ---------------------------------------------------------------------- 
    

    but it wasn't obvious that this was an issue with the version of cartopy.

    opened by aidanheerdegen 3
  • installation error

    installation error

    Hello Niclas,

    I installed xmca but when I import the library I get this error: libmkl_rt.so: cannot open shared object file: No such file or directory

    any ideas or suggestions would be greatly appreciated.

    Cheers,

    Charles.

    opened by chjones2 3
  • May cause errors after using mca. normalize()

    May cause errors after using mca. normalize()

    Hello, first of all, thank you for your contribution. However, I found a bug that may cause SVD to fail to calculate (due to the existence of NaN value). The details are as follows:

    when MCA class is initialized, it will be called get_nan_cols and remove_ nan_ cols to remove the NaN value, but if you call mca.normalize() at this time. New NaN values can appear and be brought into SVD calculation. This is because if the value of each time step of a grid point in the input array is the same (For example, it never rains in a place), the value obtained after standardization is NaN, which causes SVD unable to solve problem.

    opened by L1angY 2
  • Feedback

    Feedback

    I'm posting my feedback on behalf of Alex vM, who asked me to have a look at this package. The feedback is subjective and you may disagree with some if not all suggestions below.

    1. In general, the readme is well written. Clear and concise. I'd add a figure(s) though. For instance, after you call mca.plot() in your example.
    2. Currently, this package is for developers or at least those who have python experience. Well, maybe covariance estimation requires knowledge of python and programming skills, but I recommend making sphinx documentation (and publishing it on readthedocs) to bring users. You've already written docs for each if not all functions and classes. The only step left is to wrap it in sphinx with default parameters and paths.
    3. To give credits to your package and show that it's well maintained, I recommend adding badges (readthedocs, travis build and test coverage). Use CircleCI. Here is a config example (you need only build-pip job since it's the easiest).
    4. tools folder must be in the xmca folder.
    5. setup.py install requires must be read from requirements.txt file (example) and not hard-coded.
    6. GPL-3 license is too restrictive. Consider BSD-3 or even MIT.
    7. Each test that involves randomness must start with a numpy.random.seed. Currently, you're setting the seed globally. It's not a good idea because the test results depend on the test order, which, of course, should not happen.

    Good luck!

    Best, Danylo

    bug documentation 
    opened by dizcza 1
  • save_analysis currently does not save cos lat weighting

    save_analysis currently does not save cos lat weighting

    Just stumbled over this:

    When saving a model via xmca.xarray.save_analysis and cosine latitude weighting was applied (apply_coslat), the current implementation does not invoke xmca.xarray.apply_coslat when the model gets loaded via xmca.xarray.load_analysis thus creating false PCs.

    I hope to provide a fix to this soon.

    bug 
    opened by nicrie 0
  • Release 1.0.0

    Release 1.0.0

    New in release 1.0.0

    • method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)
    • more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.
    • add method to summarize performed analysis (summary)
    • add method to return input fields
    • improve docs (resolves #7)
    • correct and consistent use of definition of loadings
    • some bugfixes (e.g. resolves #12 )
    opened by nicrie 0
  • MCA errors

    MCA errors

    Hello, I am trying to run the MCA with two variables, which are a climate model, WRF's output.

    I get this error right after the bit:

    mca.plot(mode=1, **pkwargs) : 
    

    ValueError: coordinate lon has dimensions ('south_north', 'west_east'), but these are not a subset of the DataArray dimensions ['lat', 'lon', 'mode']

    Would really appreciate any help with this error. Many thanks.

    # Load packages and data:
    import xarray as xr
    import matplotlib.pyplot as plt
    import cartopy
    
    var=xr.open_dataset("F:\\era5_2000_2020_vars_salem.nc")
    
    t2=var.T2C
    snow=var.SNOWH
    
    #The variables, e.g., t2 is  structured as follows: 
    t2:
    <xarray.DataArray 'T2C' (time: 3512, south_north: 111, west_east: 114)>
    Coordinates:
        lat          (south_north, west_east) float32 ...
        lon          (south_north, west_east) float32 ...
        xtime        (time) datetime64[ns] ...
      * time         (time) datetime64[ns] 2000-11-01 2000-11-02 ... 2020-04-29
      * west_east    (west_east) float64 -2.766e+05 -2.666e+05 ... 8.534e+05
      * south_north  (south_north) float64 -1.353e+05 -1.253e+05 ... 9.647e+05
    Attributes:
        FieldType:    104
        MemoryOrder:  XY 
        description:  2m Temperature
        units:        C
        stagger:      
        pyproj_srs:   +proj=lcc +lat_0=64 +lon_0=10 +lat_1=64 +lat_2=68 +x_0=0 +y...
        coordinates:  XLONG XLAT XTIME
    
    mca = xMCA(t2, snow)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA
    
    
    eigenvalues = mca.singular_values()     
    pcs = mca.pcs()                           
    eofs = mca.eofs()   
    
    mca.set_field_names('t2','snow')
    pkwargs = {'orientation' : 'vertical'}
    mca.plot(mode=1, **pkwargs)
    
    opened by Murk89 23
  • Sourcery Starbot ⭐ refactored nicrie/xmca

    Sourcery Starbot ⭐ refactored nicrie/xmca

    Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

    Here's your pull request refactoring your most popular Python repo.

    If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

    Review changes via command line

    To manually merge these changes, make sure you're on the master branch, then run:

    git fetch https://github.com/sourcery-ai-bot/xmca master
    git merge --ff-only FETCH_HEAD
    git reset HEAD^
    
    opened by sourcery-ai-bot 0
  • multivariate EOF analysis / MCA

    multivariate EOF analysis / MCA

    add this feature in next release

    note: this will be probably be a major change since it requires to rewrite the internal structure of the package and therefore will break backwards version compatibility

    enhancement 
    opened by nicrie 0
Releases(1.4.2)
Owner
Niclas Rieger
Niclas Rieger
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 05, 2023
Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

Ignacio Darago 1 Nov 06, 2021
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Raster_Sampling_Demo (Resulting graph of this demo) Background Sampling values of a raster at specific geographic coordinates can be done with a numbe

2 Dec 13, 2022
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

168 Dec 28, 2022
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data struc

Zed(Zijun) Chen 40 Dec 12, 2022
Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa

sammuhrai 4 Jul 29, 2022
Handle, manipulate, and convert data with units in Python

unyt A package for handling numpy arrays with units. Often writing code that deals with data that has units can be confusing. A function might return

The yt project 304 Jan 02, 2023
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
This repository contains some analysis of possible nerdle answers

Nerdle Analysis https://nerdlegame.com/ This repository contains some analysis of possible nerdle answers. Here's a quick overview: nerdle.py contains

0 Dec 16, 2022
Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
A library to create multi-page Streamlit applications with ease.

A library to create multi-page Streamlit applications with ease.

Jackson Storm 107 Jan 04, 2023
Display the behaviour of a realtime program with a scope or logic analyser.

1. A monitor for realtime MicroPython code This library provides a means of examining the behaviour of a running system. It was initially designed to

Peter Hinch 17 Dec 05, 2022
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022