A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Overview

OutliersSlidingWindows

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Dataset generation

The original datasets, namely Higgs and Cover, are provided (compressed) in the data folder. One can download and preprocess the datasets as follows:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz
cat HIGGS.csv.gz | gunzip | cut -d ',' -f 23,24,25,26,27,28,29 > higgs.dat

wget https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
gunzip covtype.data.gz

The script datasets.sh decompresses the zipped original datasets and generates the artificial datasets used in the paper. In particular, the program InjectOutliers takes a dataset and injects artificial outliers. It takes as an argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • p, the probability with which to inject an outlier after every point
  • r, the scaling factor for the norm of the outlier points
  • d, the dimension of the points

The program GenerateArtificial generates automatically a dataset with points in a unit ball with outliers on the suface of a ball of radius r. It takes as an argument:

  • out, the path to the output file
  • p, the probability with which to inject an outlier
  • r, the radius of the outer ball
  • d, the dimension of the points

Running the experiments

The script exec.sh runs a representative subset of the experiments presented in the paper.

The program Main runs the experiments on the comparison of our k-center algorithm with the sequential ones. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • samp, the number of candidate centers for sampled-charikar
  • doChar, if set to 1 executes charikar, else it is skipped

It outputs, in the folder out/k-cen/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage and the clustering radius.

The program MainLambda runs the experiments on the sensitivity on lambda. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method (lambda unused)
  • minDist, maxDist, parameters of our method
  • doSlow, if set to 1 executes the slowest test, else it is skipped

It outputs, in the folder out/lam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage due to histograms, the total memory usage and the clustering radius.

The program MainEffDiam runs the experiments on the effective diameter algorithms. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • alpha, fraction fo distances to discard
  • eta, lower bound on ratio between effective diameter and diameter
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • doSeq, if set to 1 executes the sequential method, else it is skipped

It outputs, in the folder out/diam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the two methods, the update times, the query times, the memory usage and the effective diameter estimate.
Owner
PaoloPellizzoni
PaoloPellizzoni
This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Auto-Lambda This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationship

Shikun Liu 76 Dec 20, 2022
From the basics to slightly more interesting applications of Tensorflow

TensorFlow Tutorials You can find python source code under the python directory, and associated notebooks under notebooks. Source code Description 1 b

Parag K Mital 5.6k Jan 09, 2023
Score refinement for confidence-based 3D multi-object tracking

Score refinement for confidence-based 3D multi-object tracking Our video gives a brief explanation of our Method. This is the official code for the pa

Cognitive Systems Research Group 47 Dec 26, 2022
Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

SW-CV-ModelZoo Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset Framework: TF/Keras 2.7 Training SQLite D

20 Dec 27, 2022
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio [email protected]

Davide Carnemolla 17 Jun 20, 2022
An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

Chris Nota 5 Aug 30, 2022
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Shoufa Chen 244 Dec 27, 2022
Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

The Official Implementation of CLIB (Continual Learning for i-Blurry) Online Continual Learning on Class Incremental Blurry Task Configuration with An

NAVER AI 34 Oct 26, 2022
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

39 Dec 17, 2022
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Gabriele Corso 56 Dec 23, 2022
Open CV - Convert a picture to look like a cartoon sketch in python

Use the video https://www.youtube.com/watch?v=k7cVPGpnels for initial learning.

Sammith S Bharadwaj 3 Jan 29, 2022
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
A Broader Picture of Random-walk Based Graph Embedding

Random-walk Embedding Framework This repository is a reference implementation of the random-walk embedding framework as described in the paper: A Broa

Zexi Huang 23 Dec 13, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 01, 2022
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
Project for music generation system based on object tracking and CGAN

Project for music generation system based on object tracking and CGAN The project was inspired by MIDINet: A Convolutional Generative Adversarial Netw

1 Nov 21, 2021
Sentinel-1 vessel detection model used in the xView3 challenge

sar_vessel_detect Code for the AI2 Skylight team's submission in the xView3 competition (https://iuu.xview.us) for vessel detection in Sentinel-1 SAR

AI2 6 Sep 10, 2022
The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store dev

George Rocha 0 Feb 03, 2022
A3C LSTM Atari with Pytorch plus A3G design

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! RL A3C Pytorch NEWLY ADDED A3G!! New implementation of A3C

David Griffis 532 Jan 02, 2023