OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Deep Learningoslo
Overview


O S L O

Open Source framework for Large-scale transformer Optimization

GitHub release Apache 2.0 Docs Issues



What's New:

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

  • If the C++ is available:
CPP_AVAILABLE=1 pip install oslo-core
  • If the C++ is not available:
CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

  • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
  • Kernel Fusion: A GPU optimization method to increase training and inference speed.
  • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
  • Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments
  • [WIP] Implement ZeRO Stage 3 (FSDP)

    [WIP] Implement ZeRO Stage 3 (FSDP)

    Title

    • Implement ZeRO Stage 3 (FullyShardedDataParallel)

    Description

    • [x] Add reduce_scatter_bucketer.py
      • [x] Add test_reduce_scatter_bucketer.py
    • [x] Add flatten_params_wrapper.py
      • [x] Add test_flatten_params_wrapper.py
    • [x] Add containers.py
      • [x] Add test_containers.py
    • [x] Add parallel.py
      • [x] Add test_parallel.py
    • [x] Add fsdp_optim_utils.py
    • [x] Update fsdp.py
    • [x] Add auto_wrap.py
      • [x] Add test_wrap.py
    opened by jinok2im 9
  • FusedAdam & CPUAdam

    FusedAdam & CPUAdam

    Title

    -FusedAdam & CPUAdam

    Description

    • Implement FusedAdam & CPUAdam

    Tasks

    • [x] Implement FusedAdam
    • [x] implement CPUAdam
    • [x] Test FusedAdam
    • [x] Test CPUAdam
    • [x] Test FusedSclaeMaskSoftmax (Name changed)
    opened by cozytk 6
  • [WIP] Add data processing modules referring to the lassl

    [WIP] Add data processing modules referring to the lassl

    Title

    • add data processing modules referring to the lassl

    Description

    • brought data processing functions that fit gpt2 with reference to lassl

    Linked Issues

    • None
    opened by gimmaru 6
  • Implementation of Sequential Parallelism

    Implementation of Sequential Parallelism

    SP with DP implementation

    • Implemented SP wrapper with DP

    Description

    • SequenceDataParallel works like native torch DDP with SP
    • you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
    opened by ohwi 5
  • Update data collators and Add models

    Update data collators and Add models

    Title

    • Update data collators and Add models

    Description

    • Updated data collators to utilize sequence parallel in Oslo trainer
    • Add models by referring to the transformers library
    opened by gimmaru 3
  • Implement Expert Parallel and Test for Initialization and Forward Pass

    Implement Expert Parallel and Test for Initialization and Forward Pass

    Title

    • Implement Expert Parallel and Test for Initialization and Forward Pass

    Description

    • Implement Wrapper, Modules and Features for Expert Parallel
    • Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace
    • Test initialization and forward pass for expert parallel
    opened by scsc0511 3
  • Integrate Sequence Parallelism branches

    Integrate Sequence Parallelism branches

    Title

    • Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

    Description

    • This PR is Integration of SP current version. But there is something wrong.
    • We will fix the bugs for the coming week and write test modules according to the SP design.
    • It did not include the contents of the branch that worked for the test.
    opened by l-yohai 3
  • implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    • implement tp-3d wrapper
    • rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.
    • revise tp-3d layers for huggingface compatibility
    • implement tp-3d test codes
    • refactor all tp test codes
    • unify format across all tensor parallel modules.
    opened by bzantium 2
  • Refactoring MultiheadAttention with todo anchors

    Refactoring MultiheadAttention with todo anchors

    Title

    • Refactoring MultiheadAttention with todo anchors

    Description

    • Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.
    • Remove unnecessary or unintended code and clean up annotations.
    • Unify return format and the variable name with native torch.

    Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

    cc. @hyunwoongko @ohwi

    opened by l-yohai 2
  • Add tp-1d layers testing

    Add tp-1d layers testing

    • Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d
    • modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
    opened by bzantium 2
  • [WIP] add test code of sp training

    [WIP] add test code of sp training

    Title

    • SP Model Test Code

    Description

    Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

    • WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
    opened by l-yohai 2
Releases(v2.0.2)
  • v2.0.2(Aug 25, 2022)

  • v2.0.1(Feb 20, 2022)

  • v2.0.0(Feb 14, 2022)

    Official release of OSLO 2.0.0 🎉🎉

    This version of OSLO provides the following features:

    • Tensor model parallelism
    • Efficient activation checkpointing
    • Kernel fusion

    We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.


    New feature: Kernel Fusion

    {
      "kernel_fusion": {
        "enable": "bool",
        "memory_efficient_fusion": "bool",
        "custom_cuda_kernels": "list"
      }
    }
    

    For more information, please check the kernel fusion tutorial

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a2(Feb 2, 2022)

  • v2.0.0a1(Feb 2, 2022)

    Add activation checkpointing

    You can use efficient activation checkpointing using OSLO with the following configuration.

    model = oslo.initialize(
        model,
        config={
            "model_parallelism": {
                "enable": True,
                "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
            },
            "activation_checkpointing": {
                "enable": True,
                "cpu_checkpointing": True,
                "partitioned_checkpointing": True,
                "contiguous_checkpointing": True,
            },
        },
    )
    

    Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a0(Jan 30, 2022)

    New API

    • We paid homage to DeepSpeed. Now it's easier and simpler to use.
    import oslo
    
    model = oslo.initialize(model, config="oslo-config.json")
    

    Add new models

    • Albert
    • Bert
    • Bart
    • T5
    • GPT2
    • GPTNeo
    • GPTJ
    • Electra
    • Roberta

    Add document

    • https://tunib-ai.github.io/oslo

    Remove old pipeline parallelism, kernel fusion code

    • We'll refurbish them using the latest methods
      • Kernel fusion: AOTAutograd
      • Pipeline parallelism: Sagemaker PP
    Source code(tar.gz)
    Source code(zip)
  • v.1.1.2(Jan 15, 2022)

    Updates

    [#7] Selective Kernel Fusion [#9] Fix argument bug

    New Feature: Selective Kernel Fusion

    Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

    from oslo import GPT2MLP, GPT2Attention
    
    # MLP only fusion
    model.fuse([GPT2MLP])
    
    # Attention only fusion
    model.fuse([GPT2Attention])
    
    # MLP + Attention fusion
    model.fuse([GPT2MLP, GPT2Attention])
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 29, 2021)

    [#3] Add deployment launcher of Parallelformers into OSLO.

    from oslo import GPTNeoForCausalLM
    
    model = GPTNeoForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-neo-2.7B",
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        deployment=True  # <-- new feature !
    )
    

    You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Dec 22, 2021)

  • v1.0(Dec 21, 2021)


    O S L O

    Open Source framework for Large-scale transformer Optimization

    GitHub release Apache 2.0 Docs Issues



    What's New:

    What is OSLO about?

    OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

    Installation

    OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

    pip install oslo-core
    

    Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

    • If the C++ is available:
    CPP_AVAILABLE=1 pip install oslo-core
    
    • If the C++ is not available:
    CPP_AVAILABLE=0 pip install oslo-core
    

    Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

    Key Features

    import deepspeed 
    from oslo import GPTJForCausalLM
    
    # 1. 3D Parallelism
    model = GPTJForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
    )
    
    # 2. Kernel Fusion
    model = model.fuse()
    
    # 3. DeepSpeed Support
    engines = deepspeed.initialize(
        model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
    )
    
    # 4. Data Processing
    from oslo import (
        DatasetPreprocessor, 
        DatasetBlender, 
        DatasetForCausalLM, 
        ...    
    )
    

    OSLO offers the following features.

    • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
    • Kernel Fusion: A GPU optimization method to increase training and inference speed.
    • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
    • Data Processing: Various utilities for efficient large-scale data processing.

    See USAGE.md to learn how to use them.

    Administrative Notes

    Citing OSLO

    If you find our work useful, please consider citing:

    @misc{oslo,
      author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
      title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
      howpublished = {\url{https://github.com/tunib-ai/oslo}},
      year         = {2021},
    }
    

    Licensing

    The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

    Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

    Acknowledgements

    The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

    Source code(tar.gz)
    Source code(zip)
Owner
TUNiB
TUNiB Inc.
TUNiB
Data pipelines for both TensorFlow and PyTorch!

rapidnlp-datasets Data pipelines for both TensorFlow and PyTorch ! If you want to load public datasets, try: tensorflow/datasets huggingface/datasets

1 Dec 08, 2021
The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

RegSeg The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation" Paper: arxiv D block Decoder Setup Install the

Roland 61 Dec 27, 2022
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

-IEEE-TIM-2021-1-Shallow-CNN-for-HAR [IEEE TIM 2021-1] Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors All

Wenbo Huang 1 May 17, 2022
This is the repository of the NeurIPS 2021 paper "Curriculum Disentangled Recommendation withNoisy Multi-feedback"

Curriculum_disentangled_recommendation This is the repository of the NeurIPS 2021 paper "Curriculum Disentangled Recommendation with Noisy Multi-feedb

14 Dec 20, 2022
Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation We present a generic image-to-image translation framework, pixel2style2pixel (pSp

2.8k Dec 30, 2022
Analysis of Smiles through reservoir sampling & RDkit

Analysis of Smiles through reservoir sampling and machine learning (under development). This is a simple project that includes two Jupyter files for t

Aurimas A. NausÄ—das 6 Aug 30, 2022
PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

Princeton Natural Language Processing 657 Jan 09, 2023
Commonsense Ability Tests

CATS Commonsense Ability Tests Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models Use making_sense.py to run the exper

XUHUI ZHOU 28 Oct 19, 2022
(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

yuta-saito 19 Dec 01, 2022
Boostcamp CV Serving For Python

Boostcamp-CV-Serving Prerequisites MySQL GCP Cloud Storage GCP key file Sentry Streamlit Cloud Secrets: .streamlit/secrets.toml #DO NOT SHARE THIS I

Jungwon Seo 19 Feb 22, 2022
Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Self-Supervised Reward Regression (SSRR) Codebase for CoRL 2021 paper "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression "

19 Dec 12, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022
Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

CMT Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award) [Paper] [Site] Directory Struc

Zhaokai Wang 198 Dec 27, 2022
Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation"

1 Introduction Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation". The code s

Liang Zhang 10 Dec 10, 2022
A hybrid framework (neural mass model + ML) for SC-to-FC prediction

The current workflow simulates brain functional connectivity (FC) from structural connectivity (SC) with a neural mass model. Gradient descent is applied to optimize the parameters in the neural mass

Yilin Liu 1 Jan 26, 2022
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
A large-image collection explorer and fast classification tool

IMAX: Interactive Multi-image Analysis eXplorer This is an interactive tool for visualize and classify multiple images at a time. It written in Python

Matias Carrasco Kind 23 Dec 16, 2022
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022