The implementation of FOLD-R++ algorithm

Overview

FOLD-R-PP

The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task.

Installation

Prerequisites

FOLD-R++ is developed with only python3. Numpy is the only dependency:

python3 -m pip install numpy

Instruction

Data preparation

The FOLD-R++ algorithm takes tabular data as input, the first line for the tabular data should be the feature names of each column. The FOLD-R++ does not need encoding for training. It can deal with numeric, categorical, and even mixed type features (one column contains categorical and numeric values) directly. But, the numeric features should be specified before loading data, otherwise they would be dealt like categorical features (only literals with = and != would be generated).

There are many UCI datasets can be found in the data directory, and the code pieces of data preparation should be added to datasets.py.

For example, the UCI breast-w dataset can be loaded with the following code:

columns = ['clump_thickness', 'cell_size_uniformity', 'cell_shape_uniformity', 'marginal_adhesion',
'single_epi_cell_size', 'bare_nuclei', 'bland_chromatin', 'normal_nucleoli', 'mitoses']
nums = columns
data, num_idx, columns = load_data('data/breastw/breastw.csv', attrs=columns, label=['label'], numerics=nums, pos='benign')

columns lists all the features needed, nums lists all the numeric features, label implies the feature name of the label, pos indicates the positive value of the label.

Training

The FOLD-R++ algorithm generates an explainable model that is represented with an answer set program for classification tasks. Here's an training example for breast-w dataset:

X_train, Y_train = split_xy(data_train)
X_pos, X_neg = split_X_by_Y(X_train, Y_train)
rules1 = foldrpp(X_pos, X_neg, [])

We have got a rule set rules1 in a nested intermediate representation. Flatten and decode the nested rules to answer set program:

fr1 = flatten(rules1)
rule_set = decode_rules(fr1, attrs)
for r in rule_set:
    print(r)

The training process can be started with: python3 main.py

An answer set program that is compatible with s(CASP) is generated as below.

% breastw dataset (699, 10).
% the answer set program generated by foldr++:

label(X,'benign'):- bare_nuclei(X,'?').
label(X,'benign'):- bland_chromatin(X,N6), N6=<4.0,
		    clump_thickness(X,N0), N0=<6.0,  
                    bare_nuclei(X,N5), N5=<1.0, not ab7(X).   
label(X,'benign'):- cell_size_uniformity(X,N1), N1=<2.0,
		    not ab3(X), not ab5(X), not ab6(X).  
label(X,'benign'):- cell_size_uniformity(X,N1), N1=<4.0,
		    bare_nuclei(X,N5), N5=<3.0,
		    clump_thickness(X,N0), N0=<3.0, not ab8(X).  
ab2(X):- clump_thickness(X,N0), N0=<1.0.  
ab3(X):- bare_nuclei(X,N5), N5>5.0, not ab2(X).  
ab4(X):- cell_shape_uniformity(X,N2), N2=<1.0.  
ab5(X):- clump_thickness(X,N0), N0>7.0, not ab4(X).  
ab6(X):- bare_nuclei(X,N5), N5>4.0, single_epi_cell_size(X,N4), N4=<1.0.  
ab7(X):- marginal_adhesion(X,N3), N3>4.0.  
ab8(X):- marginal_adhesion(X,N3), N3>6.0.  

% foldr++ costs:  0:00:00.027710  post: 0:00:00.000127
% acc 0.95 p 0.96 r 0.9697 f1 0.9648 

Testing in Python

The testing data X_test, a set of testing data, can be predicted with the predict function in Python.

Y_test_hat = predict(rules1, X_test)

The classify function can also be used to classify a single data.

y_test_hat = classify(rules1, x_test)

Justification by using s(CASP)

Classification and justification can be conducted with s(CASP), but the data also need to be converted into predicate format. The decode_test_data function can be used for generating predicates for testing data.

data_pred = decode_test_data(data_test, attrs)
for p in data_pred:
    print(p)

Here is an example of generated testing data predicates along with the answer set program for acute dataset:

% acute dataset (120, 7) 
% the answer set program generated by foldr++:

ab2(X):- a5(X,'no'), a1(X,N0), N0>37.9.
label(X,'yes'):- not a4(X,'no'), not ab2(X).

% foldr++ costs:  0:00:00.001990  post: 0:00:00.000040
% acc 1.0 p 1.0 r 1.0 f1 1.0 

id(1).
a1(1,37.2).
a2(1,'no').
a3(1,'yes').
a4(1,'no').
a5(1,'no').
a6(1,'no').

id(2).
a1(2,38.1).
a2(2,'no').
a3(2,'yes').
a4(2,'yes').
a5(2,'no').
a6(2,'yes').

id(3).
a1(3,37.5).
a2(3,'no').
a3(3,'no').
a4(3,'yes').
a5(3,'yes').
a6(3,'yes').

s(CASP)

All the resources of s(CASP) can be found at https://gitlab.software.imdea.org/ciao-lang/sCASP.

Citation

@misc{wang2021foldr,
      title={FOLD-R++: A Toolset for Automated Inductive Learning of Default Theories from Mixed Data}, 
      author={Huaduo Wang and Gopal Gupta},
      year={2021},
      eprint={2110.07843},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
pixelNeRF: Neural Radiance Fields from One or Few Images

pixelNeRF: Neural Radiance Fields from One or Few Images Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa UC Berkeley arXiv: http://arxiv.org/abs/2

Alex Yu 1k Jan 04, 2023
A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

PyGrid is a peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft. PyGrid is also the central serv

OpenMined 615 Jan 03, 2023
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

JK Jung 118 Nov 10, 2022
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

SmartCane-DL-Model Smart Cane using semantic segmentation 참고한 Github repositoy 🔗 https://github.com/JunHyeok96/Road-Segmentation.git 데이터셋 🔗 https://

반드시 졸업한다 (Team Just Graduate) 4 Dec 03, 2021
Generalized hybrid model for mode-locked laser diodes with an extended passive cavity

GenHybridMLLmodel Generalized hybrid model for mode-locked laser diodes with an extended passive cavity This hybrid simulation strategy combines a tra

Stijn Cuyvers 3 Sep 21, 2022
basic tutorial on pytorch

Quick Tutorial on PyTorch PyTorch Basics Linear Regression Logistic Regression Artificial Neural Networks Convolutional Neural Networks Recurrent Neur

7 Sep 15, 2022
Byzantine-robust decentralized learning via self-centered clipping

Byzantine-robust decentralized learning via self-centered clipping In this paper, we study the challenging task of Byzantine-robust decentralized trai

EPFL Machine Learning and Optimization Laboratory 4 Aug 27, 2022
A light weight data augmentation tool for training CNNs and Viola Jones detectors

hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six

Jaiyam Sharma 2 Nov 23, 2019
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

20 Sep 08, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys

Nguyễn Thị Thanh Kim 5 Nov 13, 2022
The hippynn python package - a modular library for atomistic machine learning with pytorch.

The hippynn python package - a modular library for atomistic machine learning with pytorch. We aim to provide a powerful library for the training of a

Los Alamos National Laboratory 37 Dec 29, 2022
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
Research code of ICCV 2021 paper "Mesh Graphormer"

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
A scikit-learn-compatible module for estimating prediction intervals.

MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourit

588 Jan 04, 2023