Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Overview

Code for KSDAgg: a KSD aggregated goodness-of-fit test

This GitHub repository contains the code for the reproducible experiments presented in our paper KSD Aggregated Goodness-of-fit Test:

  • Gamma distribution experiment,
  • Gaussian-Bernoulli Restricted Boltzmann Machine experiment,
  • MNIST Normalizing Flow experiment.

We provide the code to run the experiments to generate Figures 1-4 and Table 1 from our paper, those can be found in figures.

Our aggregated test KSDAgg is implemented in ksdagg.py. We provide code for two quantile estimation methods: the wild bootstrap and the parametric bootstrap. Our implementation uses the IMQ (inverse multiquadric) kernel with a collection of bandwidths consisting of the median bandwidth scaled by powers of 2, and with one of the four types of weights proposed in MMD Aggregated Two-Sample Test. We also provide custom KSDAgg functions in ksdagg.py which allow for the use of any kernel collections and weights.

Requirements

  • python 3.9

Installation

In a chosen directory, clone the repository and change to its directory by executing

git clone [email protected]:antoninschrab/ksdagg-paper.git
cd ksdagg-paper

We then recommend creating and activating a virtual environment by either

  • using venv:
    python3 -m venv ksdagg-env
    source ksdagg-env/bin/activate
    # can be deactivated by running:
    # deactivate
    
  • or using conda:
    conda create --name ksdagg-env python=3.9
    conda activate ksdagg-env
    # can be deactivated by running:
    # conda deactivate
    

The required packages can then be installed in the virtual environment by running

python -m pip install -r requirements.txt

Generating or downloading the data

The data for the Gaussian-Bernoulli Restricted Boltzmann Machine experiment and for the MNIST Normalizing Flow experiment can

  • be obtained by executing
    python generate_data_rbm.py
    python generate_data_nf.py
    
  • or, as running the above scripts can be computationally expensive, we also provide the option to download their outputs directly
    python download_data.py
    

Those scripts generate samples and compute their associated scores under the model for the different settings considered in our experiments, the data is saved in the new directory data.

Reproducing the experiments of the paper

First, for our three experiments, we compute KSD values to be used for the parametric bootstrap and save them in the directory parametric. This can be done by running

python generate_parametric.py

For convenience, we directly provide the directory parametric obtained by running this script.

To run the three experiments, the following commands can be executed

python experiment_gamma.py 
python experiment_rbm.py 
python experiment_nf.py 

Those commands run all the tests necessary for our experiments, the results are saved in dedicated .csv and .pkl files in the directory results (which is already provided for ease of use). Note that our expeiments are comprised of 'embarrassingly parallel for loops', for which significant speed up can be obtained by using parallel computing libraries such as joblib or dask.

The actual figures of the paper can be obtained from the saved dataframes in results by using the command

python figures.py  

The figures are saved in the directory figures and correspond to the ones used in our paper.

References

Our KSDAgg code is based our MMDAgg implementation which can be found at https://github.com/antoninschrab/mmdagg-paper.

For the Gaussian-Bernoulli Restricted Boltzmann Machine experiment, we obtain the samples and scores in generate_data_rbm.py by relying on Wittawat Jitkrittum's implementation which can be found at https://github.com/wittawatj/kernel-gof under the MIT License. The relevant files we use are in the directory kgof.

For the MNIST Normalizing Flow experiment, we use in generate_data_nf.py a multiscale Normalizing Flow trained on the MNIST dataset as implemented by Phillip Lippe in Tutorial 11: Normalizing Flows for image modeling as part of the UvA Deep Learning Tutorials under the MIT License.

Author

Antonin Schrab

Centre for Artificial Intelligence, Department of Computer Science, University College London

Gatsby Computational Neuroscience Unit, University College London

Inria, Lille - Nord Europe research centre and Inria London Programme

Bibtex

@unpublished{schrab2022ksd,
    title={{KSD} Aggregated Goodness-of-fit Test},
    author={Antonin Schrab and Benjamin Guedj and Arthur Gretton},
    year={2022},
    note = "Submitted.",
    abstract = {We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide theoretical guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. KSDAgg can be computed exactly in practice as it relies either on a parametric bootstrap or on a wild bootstrap to estimate the quantiles and the level corrections. In particular, for the crucial choice of bandwidth of a fixed kernel, it avoids resorting to arbitrary heuristics (such as median or standard deviation) or to data splitting. We find on both synthetic and real-world data that KSDAgg outperforms other state-of-the-art adaptive KSD-based goodness-of-fit testing procedures.},
    url = {https://arxiv.org/abs/2202.00824},
    url_PDF = {https://arxiv.org/pdf/2202.00824.pdf},
    url_Code = {https://github.com/antoninschrab/ksdagg-paper},
    eprint={2202.00824},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

License

MIT License (see LICENSE.md)

Owner
Antonin Schrab
Antonin Schrab
FAVD: Featherweight Assisted Vulnerability Discovery

FAVD: Featherweight Assisted Vulnerability Discovery This repository contains the replication package for the paper "Featherweight Assisted Vulnerabil

secureIT 4 Sep 16, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

24 Nov 02, 2022
Fuzzing the Kernel Using Unicornafl and AFL++

Unicorefuzz Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19. Is it any good? ye

Security in Telecommunications 283 Dec 26, 2022
Hierarchical Attentive Recurrent Tracking

Hierarchical Attentive Recurrent Tracking This is an official Tensorflow implementation of single object tracking in videos by using hierarchical atte

Adam Kosiorek 147 Aug 07, 2021
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

Daniel Bourke 3.4k Jan 07, 2023
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

This repository is no longer maintained. Please use our new Softlearning package instead. Soft Actor-Critic Soft actor-critic is a deep reinforcement

Tuomas Haarnoja 752 Jan 07, 2023
Autonomous Perception: 3D Object Detection with Complex-YOLO

Autonomous Perception: 3D Object Detection with Complex-YOLO LiDAR object detect

Thomas Dunlap 2 Feb 18, 2022
[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma This is the offi

Kaidi Cao 528 Jan 01, 2023
SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

SurfEmb SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings Rasmus Laurvig Haugard, A

Rasmus Haugaard 56 Nov 19, 2022
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language

137 Jan 02, 2023
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

MoEBERT This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022). Installation Create an

Simiao Zuo 34 Dec 24, 2022
Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

GN-Transformer AST This is the official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks". Data Prep

Cheng Jun-Yan 10 Nov 26, 2022
Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

Hydra: An Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems Paper Finding Semantic Bugs in File Systems with an Extensible Fuzzin

gts3.org (<a href=[email protected])"> 129 Dec 15, 2022
Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Resilient projection-based consensus actor-critic (RPBCAC) algorithm We implement the RPBCAC algorithm with nonlinear approximation from [1] and focus

Martin Figura 5 Jul 12, 2022
A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from

2 May 09, 2022
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua

Zhiling Zhang 13 Dec 23, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 06, 2022
A simple but complete full-attention transformer with a set of promising experimental features from various papers

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip ins

Phil Wang 2.3k Jan 03, 2023