A Python package for generating concise, high-quality summaries of a probability distribution

Overview

GoodPoints

A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints is a collection of tools for compressing a distribution more effectively than independent sampling:

  • Given an initial summary of n input points, kernel thinning returns s << n output points with comparable integration error across a reproducing kernel Hilbert space
  • Compress++ reduces the runtime of generic thinning algorithms with minimal loss in accuracy

Installation

To install the goodpoints package, use the following pip command:

pip install goodpoints

Getting started

The primary kernel thinning function is thin in the kt module:

from goodpoints import kt
coreset = kt.thin(X, m, split_kernel, swap_kernel, delta=0.5, seed=123, store_K=False)
    """Returns kernel thinning coreset of size floor(n/2^m) as row indices into X
    
    Args:
      X: Input sequence of sample points with shape (n, d)
      m: Number of halving rounds
      split_kernel: Kernel function used by KT-SPLIT (typically a square-root kernel, krt);
        split_kernel(y,X) returns array of kernel evaluations between y and each row of X
      swap_kernel: Kernel function used by KT-SWAP (typically the target kernel, k);
        swap_kernel(y,X) returns array of kernel evaluations between y and each row of X
      delta: Run KT-SPLIT with constant failure probabilities delta_i = delta/n
      seed: Random seed to set prior to generation; if None, no seed will be set
      store_K: If False, runs O(nd) space version which does not store kernel
        matrix; if True, stores n x n kernel matrix
    """

For example uses, please refer to the notebook examples/kt/run_kt_experiment.ipynb.

The primary Compress++ function is compresspp in the compress module:

from goodpoints import compress
coreset = compress.compresspp(X, halve, thin, g)
    """Returns Compress++(g) coreset of size sqrt(n) as row indices into X

    Args: 
        X: Input sequence of sample points with shape (n, d)
        halve: Function that takes in an (n', d) numpy array Y and returns 
          floor(n'/2) distinct row indices into Y, identifying a halved coreset
        thin: Function that takes in an (n', d) numpy array Y and returns
          2^g sqrt(n') row indices into Y, identifying a thinned coreset
        g: Oversampling factor
    """

For example uses, please refer to the code examples/compress/construct_compresspp_coresets.py.

Examples

Code in the examples directory uses the goodpoints package to recreate the experiments of the following research papers.


Kernel Thinning

@article{dwivedi2021kernel,
  title={Kernel Thinning},
  author={Raaz Dwivedi and Lester Mackey},
  journal={arXiv preprint arXiv:2105.05842},
  year={2021}
}
  1. The script examples/kt/submit_jobs_run_kt.py reproduces the vignette experiments of Kernel Thinning on a Slurm cluster by executing examples/kt/run_kt_experiment.ipynb with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb, where in the last code block we report the median heuristic based bandwidth parameteters (along with the code to compute it).
  2. After all results have been generated, the notebook plot_results.ipynb can be used to reproduce the figures of Kernel Thinning.

Generalized Kernel Thinning

@article{dwivedi2021generalized,
  title={Generalized Kernel Thinning},
  author={Raaz Dwivedi and Lester Mackey},
  journal={arXiv preprint arXiv:2110.01593},
  year={2021}
}
  1. The script examples/gkt/submit_gkt_jobs.py reproduces the vignette experiments of Generalized Kernel Thinning on a Slurm cluster by executing examples/gkt/run_generalized_kt_experiment.ipynb with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb.
  2. Once the coresets are generated, examples/gkt/compute_test_function_errors.ipynb can be used to generate integration errors for different test functions.
  3. After all results have been generated, the notebook examples/gkt/plot_gkt_results.ipynb can be used to reproduce the figures of Generalized Kernel Thinning.

Distribution Compression in Near-linear Time

@article{shetti2021distribution,
  title={Distribution Compression in Near-linear Time},
  author={Abhishek Shetty and Raaz Dwivedi and Lester Mackey},
  journal={arXiv preprint arXiv:2111.07941},
  year={2021}
}
  1. The notebook examples/compress/script_to_deploy_jobs.ipynb reproduces the experiments of Distribution Compression in Near-linear Time in the following manner: 1a. It generates various coresets and computes their mmds by executing examples/compress/construct_{THIN}_coresets.py for THIN in {compresspp, kt, st, herding} with appropriate parameters, where the flag kt stands for kernel thinning, st stands for standard thinning (choosing every t-th point), and herding refers to kernel herding. 1b. It compute the runtimes of different algorithms by executing examples/compress/run_time.py. 1c. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb. 1d. The notebook currently deploys these jobs on a slurm cluster, but setting deploy_slurm = False in examples/compress/script_to_deploy_jobs.ipynb will submit the jobs as independent python calls on terminal.
  2. After all results have been generated, the notebook examples/compress/plot_compress_results.ipynb can be used to reproduce the figures of Distribution Compression in Near-linear Time.
  3. The script examples/compress/construct_compresspp_coresets.py contains the function recursive_halving that converts a halving algorithm into a thinning algorithm by recursively halving.
  4. The script examples/compress/construct_herding_coresets.py contains the herding function that runs kernel herding algorithm introduced by Yutian Chen, Max Welling, and Alex Smola.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
最新版本yolov5+deepsort目标检测和追踪,支持5.0版本可训练自己数据集

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

422 Dec 30, 2022
Enhancing Knowledge Tracing via Adversarial Training

Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T

Xiaopeng Guo 14 Oct 24, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus `Chen 2 Jun 09, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning

MaCan 4.2k Dec 29, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

ObjProp Introduction This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Insta

Anirudh S Chakravarthy 6 May 03, 2022
Implementation of Nyström Self-attention, from the paper Nyströmformer

Nyström Attention Implementation of Nyström Self-attention, from the paper Nyströmformer. Yannic Kilcher video Install $ pip install nystrom-attention

Phil Wang 95 Jan 02, 2023
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022
A containerized REST API around OpenAI's CLIP model.

OpenAI's CLIP — REST API This is a container wrapping OpenAI's CLIP model in a RESTful interface. Running the container locally First, build the conta

Santiago Valdarrama 48 Nov 06, 2022
[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 04, 2023
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 07, 2022
Code for our paper "Graph Pre-training for AMR Parsing and Generation" in ACL2022

AMRBART An implementation for ACL2022 paper "Graph Pre-training for AMR Parsing and Generation". You may find our paper here (Arxiv). Requirements pyt

xfbai 60 Jan 03, 2023
pytorch implementation of ABC : Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

ABC:Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning, NeurIPS 2021 pytorch implementation of ABC : Auxiliary Balanced Class

Hyuck Lee 25 Dec 22, 2022
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking Updates 08/2021: check out our domain adaptation for video segmentation paper Domain A

17 Nov 30, 2022
Deep motion transfer

animation-with-keypoint-mask Paper The right most square is the final result. Softmax mask (circles): \ Heatmap mask: \ conda env create -f environmen

9 Nov 01, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

50 Dec 17, 2022
MILK: Machine Learning Toolkit

MILK: MACHINE LEARNING TOOLKIT Machine Learning in Python Milk is a machine learning toolkit in Python. Its focus is on supervised classification with

Luis Pedro Coelho 610 Dec 14, 2022
pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル

pytorch_remove_ScatterND pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル。 スライスしたtensorにそのまま代入してしまうとScatterNDになるため、計算結果をcatで新しいtensorにする。 python ver

2 Dec 01, 2022
Safe Policy Optimization with Local Features

Safe Policy Optimization with Local Feature (SPO-LF) This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization wi

Akifumi Wachi 6 Jun 05, 2022