Ludwig Benchmarking Toolkit

Overview

Ludwig Benchmarking Toolkit

The Ludwig Benchmarking Toolkit is a personalized benchmarking toolkit for running end-to-end benchmark studies across an extensible set of tasks, deep learning models, standard datasets and evaluation metrics.

Getting set-up

To get started, use the following commands to set-up your conda environment.

git clone https://github.com/HazyResearch/ludwig-benchmarking-toolkit.git
cd ludwig-benchmarking-toolkit
conda env create -f environments/{environment-osx.yaml, environment-linux.yaml}
conda activate lbt

Relevant files and directories

experiment-templates/task_template.yaml: Every task (i.e. text classification) will have its owns task template. The template specifies the model architecture (encoder and decoder structure), training parameters, and a hyperopt configuration for the task at hand. A large majority of the values of the template will be populated by the values in the hyperopt_config.yaml file and dataset_metadata.yaml at training time. The sample task template located in experiment-templates/task_template.yaml is for text classification. See sample-task-templates/ for other examples.

experiment-templates/hyperopt_config.yaml: provides a range of values for training parameters and hyperopt params that will populate the hyperopt configuration in the model template

experiment-templates/dataset_metadata.yaml: contains list of all available datasets (and associated metadata) that the hyperparameter optimization can be performed over.

model-configs/: contains all encoder specific yaml files. Each files specifies possible values for relevant encoder parameters that will be optimized over. Each file in this directory adheres to the naming convention {encoder_name}_hyperopt.yaml

hyperopt-experiment-configs/: houses all experiment configs built from the templates specified above (note: this folder will be populated at run-time) and will be used when the hyperopt experiment is called. At a high level, each config file specifies the training and hyperopt information for a (task, dataset, architecture) combination. An example might be (text classification, SST2, BERT)

elasticsearch_config.yaml : this is an optional file that is to be defined if an experiment data will be saved to an elastic database.

USAGE

Command-Line Usage

Running your first TOY experiment:

For testing/setup purposes we have included a toy dataset called toy_agnews. This dataset contains a small set of training, test and validation samples from the original agnews dataset.

Before running a full-scale experiment, we recommend running an experiment locally on the toy dataset:

python experiment_driver.py --run_environment local --datasets toy_agnews --custom_models_list rnn

Running your first REAL experiment:

Steps for configuring + running an experiment:

  1. Declare and configure the search space of all non-model specific training and preprocessing hyperparameters in the experiment-templates/hyperopt_config.yaml file. The parameters specified in this file will be used across all model experiments.

  2. Declare and configure the search space of model specific hyperparameters in the {encoder}_hyperopt.yaml files in ./model_configs

    NOTE:

    • for both (1) and (2) see the Ludwig Hyperparamter Optimization guide to see what parameters for training, preprocessing, and input/ouput features can be used in the hyperopt search
    • if the exectuor type is Ray the list of available search spaces and input format differs slightly than the built-in ludwig types. Please see the Ray Tune search space docs for more information.
  3. Run the following command specifying the datasets, encoders, path to elastic DB index config file, run environment and more:

        python experiment_driver.py \
            --experiment_output_dir  
         
          
            --run_environment {local, gcp}
            --elasticsearch_config 
          
           
            --dataset_cache_dir 
           
            
            --custom_model_list 
            
             
            --datasets 
             
               --resume_existing_exp bool 
             
            
           
          
         

NOTE: Please use python experiment_driver.py -h to see list of available datasets, encoders and args

API Usage

It is also possible to run, customize and experiments using LBTs APIs. In the following section, we describe the three flavors of APIs included in LBT.

experiment API

This API provides an alternative method for running experiments. Note that runnin experiments via the API still requires populating the aforemented configuration files

from lbt.experiments import experiment

experiment(
    models = ['rnn', 'bert'],
    datasets = ['agnews'],
    run_environment = "local",
    elastic_search_config = None,
    resume_existing_exp = False,
)

tools API

This API provides access to two tooling integrations (TextAttack and Robustness Gym (RG)). The TextAttack API can be used to generate adversarial attacks. Moreover, users can use the TextAttack interface to augment data files. The RG API which empowers users to inspect model performance on a set of generic, pre-built slices and to add more slices for their specific datasets and use cases.

from lbt.tools.robustnessgym import RG 
from lbt.tools.textattack import attack, augment

# Robustness Gym API Usage
RG( dataset_name="AGNews",
    models=["bert", "rnn"],
    path_to_dataset="agnews.csv", 
    subpopulations=[ "entities", "positive_words", "negative_words"]))

# TextAttack API Usage
attack(dataset_name="AGNews", path_to_model="agnews/model/rnn_model",
    path_to_dataset="agnews.csv", attack_recipe=["CharSwapAugmenter"])

augment(dataset_name="AGNews", transformations_per_example=1
   path_to_dataset="agnews.csv", augmenter=["WordNetAugmenter"])

visualizations API

This API provides out-of-the-box support for visualizations for learning behavior, model performance, and hyperparameter optimization using the training and evaluation statistics generated during model training

import lbt.visualizations

# compare model performance
compare_performance_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_feature_name="class_index",
)

# compare training and validation trajectory
learning_curves_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_feature_name="class_index",
)

# visualize hyperoptimzation search
hyperopt_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_dir="."
)

EXPERIMENT EXTENSIBILITY

Adding new custom datasets

Adding custom dataset requires creating a new LBTDataset class and adding it to the dataset registry. Creating an LBTDataset object requires implementing three class methods: download, process and load. Please see the the ToyAGNews dataset as an example.

Adding new metrics

Adding custom evaluation metrics requires creating a new LBTMetric class and adding it to the metrics registry. Creating an LBTMetric object requires implementing the run class method which takes as potential inputs a path to a model directory, path to a dataset, training batch size, and training statistics. Please see the pre-built LBT metrics for examples.

ELASTICSEARCH RESEARCH DATABASE

To get credentials to upload experiments to the shared Elasticsearch research database, please fill out this form.

Owner
HazyResearch
We are a CS research group led by Prof. Chris Ré.
HazyResearch
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors   In order to facilitate the res

yujmo 11 Dec 12, 2022
A repository for benchmarking neural vocoders by their quality and speed.

License The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, Para

Meta Research 177 Dec 12, 2022
Prior-Guided Multi-View 3D Head Reconstruction

Prior-Guided Head MVS This repository includes some reconstruction results of our IEEE TMM 2021 paper, Prior-Guided Multi-View 3D Head Reconstruction.

11 Aug 17, 2022
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD — Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 06, 2023
CryptoFrog - My First Strategy for freqtrade

cryptofrog-strategies CryptoFrog - My First Strategy for freqtrade NB: (2021-04-20) You'll need the latest freqtrade develop branch otherwise you migh

Robert Davey 137 Jan 01, 2023
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023
An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses

Dem Bones This repository contains an implementation of Smooth Skinning Decomposition with Rigid Bones, an automated algorithm to extract the Linear B

Electronic Arts 684 Dec 26, 2022
Real time Human Detection Counting

In this python project, we are going to build the Human Detection and Counting System through Webcam or you can give your own video or images. This is a deep learning project on computer vision, whic

Mir Nawaz Ahmad 2 Jun 17, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

SVIP Lab 91 Dec 26, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

Youssef Chafiqui 2 Jan 07, 2022
基于AlphaPose的TensorRT加速

1. Requirements CUDA 11.1 TensorRT 7.2.2 Python 3.8.5 Cython PyTorch 1.8.1 torchvision 0.9.1 numpy 1.17.4 (numpy版本过高会出报错 this issue ) python-package s

52 Dec 06, 2022
Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant self-at

35 Oct 18, 2022
Pure python PEMDAS expression solver without using built-in eval function

pypemdas Pure python PEMDAS expression solver without using built-in eval function. Supports nested parenthesis. Supported operators: + - * / ^ Exampl

1 Dec 22, 2021
Consensus score for tripadvisor

ContripScore ContripScore is essentially a score that combines an Internet platform rating and a consensus rating from sentiment analysis (For instanc

Pepe 1 Jan 13, 2022
The Generic Manipulation Driver Package - Implements a ROS Interface over the robotics toolbox for Python

Armer Driver Armer aims to provide an interface layer between the hardware drivers of a robotic arm giving the user control in several ways: Joint vel

QUT Centre for Robotics (QCR) 13 Nov 26, 2022
Face Recognition plus identification simply and fast | Python

PyFaceDetection Face Recognition plus identification simply and fast Ubuntu Setup sudo pip3 install numpy sudo pip3 install cmake sudo pip3 install dl

Peyman Majidi Moein 16 Sep 22, 2022