A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Overview

MCILBoost

Project | CVPR Paper | MIA Paper
Contact: Jun-Yan Zhu (junyanz at cs dot cmu dot edu)

Overview

This is the authors' implementation of MCIL-Boost method described in:
[1] Multiple Clustered Instance Learning for Histopathology Cancer Image Segmentation, Clustering, and Classification.
Yan Xu*, Jun-Yan Zhu*, Eric Chang, and Zhuowen Tu (*equal contribution)
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

[2] Weakly Supervised Histopathology Cancer Image Segmentation and Classification
Yan Xu, Jun-Yan Zhu, Eric I-Chao Chang, Maode Lai, and Zhuowen Tu
In Medical Image Analysis, 2014.

Please cite our papers if you use our code for your research.

This package consists of the following two multiple-instance learning (MIL) methods:

  • MIL-Boost [Viola et al. 2006]: set c = 1
  • MCIL-Boost [1] [2]: set c > 1

The core of this package is a command-line interface written in C++. Various Matlab helper functions are provided to help users easily train/test MCIL-Boost model, perform cross-validation, and evaluate the performance.

System Requirement

  • Linux and Windows.
  • For Linux, the code is compiled by gcc 4.8.2 under Ubuntu 14.04.

Installation

  • Download and unzip the code.
    • For Linux users, type "chmod +x MCILBoost".
  • Open Matlab and run "demoToy.m".
  • To use the command-line interface, see "Command Usage".
  • To use Matlab functions, see "Matlab helper functions"; You can modify "SetParamsToy.m" and "demoToy.m" to run your own experiments.

Quick Examples

(Windows: MCILBoost.exe; Linux: ./MCILBoost)
An example for training:
MCILBoost.exe -v 2 -t 0 -c 2 -n 150 -s 0 -r 20 toy.data toy.model
An example for testing:
MCILBoost.exe -v 2 -t 1 -c 2 toy.data toy.model toy.result

Command Usage ([ ]: options)

MCILBoost.exe [-v verbose] [-t mode] [-c #clusters] [-n #weakClfs] [-s softmax] data_file model_file [result_file] (No need to specifiy c, n, s, r for test as the program will copy these parameters from the model_file)

-v verbose: shows details about the runtime output (default = 1) 0 -- no output 1 -- some output 2 -- more output

-t mode: set the training mode (default=0) 0 -- train a model 1 -- test a model

-c #clusters: set the number of clusters in positive bags (default = 1) c = 1 -- train a MIL-Boost model c > 1 -- train a MCIL-Boost model with multiple clusters

-n #weakClfs: set the maximum number of weak classifiers (default = 150)

-s softmax: set the softmax type: (default s = 0) 0 -- GM 1 -- LSE

-r exponent: set the exponent used in GM and LSE (default r = 20)

data_file: set the path for input data.

model_file: set the path for the model file.

result_file: set the path for result file. If result_file is not specified, result_file = data_file + '.result'

Matlab helper functions

  • MCILBoost.m: main entry function: model training/testing, and cross-validation.
  • SetParams.m: Set parameters for MCILBoost.m. You need to modify this file to run your own experiment.
  • TrainModel.m: train a model, call MCIL-Boost command line.
  • TestModel.m: test a model, call MCIL-Boost command line.
  • CrossValidate.m: split the data into n-fold, perform n-fold cross-validation, and report performance.
  • ReadData.m: read Matlab data from a text file.
  • WriteData.m: write Matlab data to a text file.
  • ReadResult.m: read Matlab result data from a text file.
  • MeasureResult.m: evaluate performance in terms of accuracy and auc (area under the curve).
  • AUC: compute the area under ROC curve given prediction and ground truth labels.
  • demoToy.m: demo script for toy data.
  • SetParamsToy.m: set parameters for demoToy.
  • demo1.m: demo script for Fox, Tiger, Elephant experiment.
  • SetParamsDemo1.m: set parameters for demo1.
  • demo2.m: demo script for SIVAL experiment.
  • SetParamsDemo2.m: set parameters for demo2.

Summary of Benchmark Results

  • I provide two scripts for running experiments on publicly available MIL benchmarks.
    • "demo1.m": experiments on Fox, Tiger, Elephant dataset.
      The MIL-Boost achieved 0.61 (Fox), 0.81 (Tiger), 0.82 (Elephant) on 10-fold cross-validation over 10 runs.
    • "demo2.m": experiments on SIVAL dataset. There are 180 positive bags (3 clusters), and 180 negative bags. While multiple clusters appear in positive bags, MCIL-Boost works better than MIL-Boost does.
      MIL-Boost (c=1): mean_acc = 0.742, mean_auc = 0.824
      MCIL-Boost (c=3): mean_acc = 0.879, mean_auc = 0.944
  • Note: See "demo1.m" and "demo2.m" for details.

Input Format

  • Note: You can use Matlab function "ReadData.m" and "WriteData.m" to read/write Matlab data from/to the text file.
  • Description: the input format is similar to the format used in LIBSVM and MILL package. The software also supports a sparse format. In the first line, you first need to specify the number of all instances, and the number of feature dimensions. Each line represents one instance, which has an instance id, bag id, and the label id (>= 1 for positive bags, and 0 for negative bags). Each feature value is represented as a : pair where is the index of the feature (starting from 1)
  • Format:
    : : : : ...
    : : : : ...
  • Example: A toy example that contains two negative bags and two positive bags. (see "toy.data") The negative instance is always (0, 0, 0) while there are two clusters of positive instances (0, 1, 0) and (0, 0, 1)
    8 3
    0:0:0 1:0 2:0 3:0
    1:0:0 1:0 2:0 3:0
    2:1:0 1:0 2:0 3:0
    3:1:0 1:0 2:0 3:0
    4:2:1 1:0 2:1 3:0
    5:2:1 1:0 2:0 3:0
    6:3:1 1:0 2:0 3:1
    7:3:1 1:0 2:0 3:0

Output Format

  • Note: You can use Matlab function "ReadResult.m" to load the Matlab data from the result file.

  • Description: The software outputs four kinds of predictions (see more details in the paper):

    • overall bag-level prediction p_i (the probability of the bag x_i being positive bag)
    • cluster-wise bag-level prediction p_i^k (the probability of the bag x_i belonging to k-th cluster)
    • overall instance-level prediction p_{ij} (the probability of the instance x_{ij} being positive instance)
    • cluster-wise instance-level prediction p_{ij}^k (the probability of the instance x_{ij} belonging to the k-th cluster)
    • In the first line, the software outputs the number of bags, and the number of clusters. Then for each bag, the software outputs the bag-level information and prediction (bag id, number of instances, ground truth label, number of clusters, and p_i).The software also outputs the bag-level prediction for each cluster (cluster id and prediction p_i^k for each cluster). Then for each instance, the software outputs the instance-level prediction (instance id and prediction p_{ij}) and instance-level prediction for each cluster (cluster_id and prediction p_{ij}^k)
  • Format:
    #bag= #cluster=
    bag_id= #insts= label= #cluster= pred=
    cluster_id= pred= cluster_id= pred= ...
    inst_id= pred= cluster_id= pred= cluster_id= pred= inst_id= pred= cluster_id= pred= cluster_id= pred= ...
    ...

  • Example: The output of the toy example:
    #bags=4 #clusters=2
    bag_id=0 #insts=2 label=0 #clusters=2 pred=0
    cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=0 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=1 #insts=2 label=0 #clusters=2 pred=0
    cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=0 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=2 #insts=2 label=1 #clusters=2 pred=1
    cluster_id=0 pred=1 cluster_id=1 pred=0
    inst_id=0 pred=1 cluster_id=0 pred=1 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=3 #insts=2 label=1 #clusters=2 pred=1
    cluster_id=0 pred=0 cluster_id=1 pred=1
    inst_id=0 pred=1 cluster_id=0 pred=0 cluster_id=1 pred=1
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0

    Credit

    Part of this code is based on the work by Piotr Dollar and Boris Babenko.

Owner
Jun-Yan Zhu
Understanding and creating pixels.
Jun-Yan Zhu
The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

The-Secret-Sharing-Schemes This interactive script demonstrates the Secret Shari

Nishaant Goswamy 1 Jan 02, 2022
Lua-parser-lark - An out-of-box Lua parser written in Lark

An out-of-box Lua parser written in Lark Such parser handles a relaxed version o

Taine Zhao 2 Jul 19, 2022
Hcpy - Interface with Home Connect appliances in Python

Interface with Home Connect appliances in Python This is a very, very beta inter

Trammell Hudson 116 Dec 27, 2022
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise

3 Aug 09, 2022
PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

Dancing to Music PyTorch implementation of the cross-modality generative model that synthesizes dance from music. Paper Hsin-Ying Lee, Xiaodong Yang,

NVIDIA Research Projects 485 Dec 26, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie

Yandex Research 20 Dec 19, 2022
Public scripts, services, and configuration for running a smart home K3S network cluster

makerhouse_network Public scripts, services, and configuration for running MakerHouse's home network. This network supports: TODO features here For mo

Scott Martin 1 Jan 15, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Hieu Duong 7 Jan 12, 2022
DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group 94 Dec 18, 2022
Automated detection of anomalous exoplanet transits in light curve data.

Automatically detecting anomalous exoplanet transits This repository contains the source code for the paper "Automatically detecting anomalous exoplan

1 Feb 01, 2022
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
Official Repository of NeurIPS2021 paper: PTR

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning Figure 1. Dataset Overview. Introduction A critical aspect of human vis

Yining Hong 32 Jun 02, 2022
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan

Phan Nguyen 1 Dec 16, 2021
Collection of NLP model explanations and accompanying analysis tools

Thermostat is a large collection of NLP model explanations and accompanying analysis tools. Combines explainability methods from the captum library wi

126 Nov 22, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023
Temporal Knowledge Graph Reasoning Triggered by Memories

MTDM Temporal Knowledge Graph Reasoning Triggered by Memories To alleviate the time dependence, we propose a memory-triggered decision-making (MTDM) n

4 Sep 25, 2022
Deep Learning Theory

Deep Learning Theory 整理了一些深度学习的理论相关内容,持续更新。 Overview Recent advances in deep learning theory 总结了目前深度学习理论研究的六个方向的一些结果,概述型,没做深入探讨(2021)。 1.1 complexity

fq 103 Jan 04, 2023