PyTorch Infer Utils

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

To install

git clone https://github.com/gorodnitskiy/pytorch_infer_utils.git
pip install /path/to/pytorch_infer_utils/

Export PyTorch model to ONNX

Check model for denormal weights to achieve better performance. Use load_weights_rounded_model func to load model with weights rounding:

from pytorch_infer_utils import load_weights_rounded_model

model = ModelClass()
load_weights_rounded_model(
    model,
    "/path/to/model_state_dict",
    map_location=map_location
)

Use ONNXExporter.torch2onnx method to export pytorch model to ONNX:

from pytorch_infer_utils import ONNXExporter

model = ModelClass()
model.load_state_dict(
    torch.load("/path/to/model_state_dict", map_location=map_location)
)
model.eval()

exporter = ONNXExporter()
input_shapes = [-1, 3, 224, 224] # -1 means that is dynamic shape
exporter.torch2onnx(model, "/path/to/model.onnx", input_shapes)

Use ONNXExporter.optimize_onnx method to optimize ONNX via onnxoptimizer:

from pytorch_infer_utils import ONNXExporter

exporter = ONNXExporter()
exporter.optimize_onnx("/path/to/model.onnx", "/path/to/optimized_model.onnx")

Use ONNXExporter.optimize_onnx_sim method to optimize ONNX via onnx-simplifier. Be careful with onnx-simplifier not to lose dynamic shapes.

from pytorch_infer_utils import ONNXExporter

exporter = ONNXExporter()
exporter.optimize_onnx_sim("/path/to/model.onnx", "/path/to/optimized_model.onnx")

Also, a method combined the above methods is available ONNXExporter.torch2optimized_onnx:

from pytorch_infer_utils import ONNXExporter

model = ModelClass()
model.load_state_dict(
    torch.load("/path/to/model_state_dict", map_location=map_location)
)
model.eval()

exporter = ONNXExporter()
input_shapes = [-1, 3, -1, -1] # -1 means that is dynamic shape
exporter.torch2optimized_onnx(model, "/path/to/model.onnx", input_shapes)

Other params that can be used in class initialization:
- default_shapes: default shapes if dimension is dynamic, default = [1, 3, 224, 224]
- onnx_export_params:
  - export_params: store the trained parameter weights inside the model file, default = True
  - do_constant_folding: whether to execute constant folding for optimization, default = True
  - input_names: the model's input names, default = ["input"]
  - output_names: the model's output names, default = ["output"]
  - opset_version: the ONNX version to export the model to, default = 11
- onnx_optimize_params:
  - fixed_point: use fixed point, default = False
  - passes: optimization passes, default = [ "eliminate_deadend", "eliminate_duplicate_initializer", "eliminate_identity", "eliminate_if_with_const_cond", "eliminate_nop_cast", "eliminate_nop_dropout", "eliminate_nop_flatten", "eliminate_nop_monotone_argmax", "eliminate_nop_pad", "eliminate_nop_transpose", "eliminate_unused_initializer", "extract_constant_to_initializer", "fuse_add_bias_into_conv", "fuse_bn_into_conv", "fuse_consecutive_concats", "fuse_consecutive_log_softmax", "fuse_consecutive_reduce_unsqueeze", "fuse_consecutive_squeezes", "fuse_consecutive_transposes", "fuse_matmul_add_bias_into_gemm", "fuse_pad_into_conv", "fuse_transpose_into_gemm", "lift_lexical_references", "nop" ]

Export ONNX to TensorRT

Check TensorRT health via check_tensorrt_health func

Use TRTEngineBuilder.build_engine method to export ONNX to TensorRT:

from pytorch_infer_utils import TRTEngineBuilder

exporter = TRTEngineBuilder()
# get engine by itself
engine = exporter.build_engine("/path/to/model.onnx")
# or save engine to /path/to/model.trt
exporter.build_engine("/path/to/model.onnx", engine_path="/path/to/model.trt")

fp16_mode is available:

from pytorch_infer_utils import TRTEngineBuilder

exporter = TRTEngineBuilder()
engine = exporter.build_engine("/path/to/model.onnx", fp16_mode=True)

int8_mode is available. It requires calibration_set of images as List[Any], load_image_func - func to correctly read and process images, max_image_shape - max image size as [C, H, W] to allocate correct size of memory:

from pytorch_infer_utils import TRTEngineBuilder

exporter = TRTEngineBuilder()
engine = exporter.build_engine(
    "/path/to/model.onnx",
    int8_mode=True,
    calibration_set=calibration_set,
    max_image_shape=max_image_shape,
    load_image_func=load_image_func,
)

Also, additional params for builder config builder.create_builder_config can be put to kwargs.
Other params that can be used in class initialization:
- opt_shape_dict: optimal shapes, default = {'input': [[1, 3, 224, 224], [1, 3, 224, 224], [1, 3, 224, 224]]}
- max_workspace_size: max workspace size, default = [1, 30]
- stream_batch_size: batch size for forward network during transferring to int8, default = 100
- cache_file: int8_mode cache filename, default = "model.trt.int8calibration"

Inference via onnxruntime on CPU and onnx_tensort on GPU

Base class ONNXWrapper __init__ has the structure as below:

def __init__(
    self,
    onnx_path: str,
    gpu_device_id: Optional[int] = None,
    intra_op_num_threads: Optional[int] = 0,
    inter_op_num_threads: Optional[int] = 0,
) -> None:
    """
    :param onnx_path: onnx-file path, required
    :param gpu_device_id: gpu device id to use, default = 0
    :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
        to let onnxruntime choose by itself is required 0, default = 0
    :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
        to let onnxruntime choose by itself is required 0, default = 0
    :type onnx_path: str
    :type gpu_device_id: int
    :type intra_op_num_threads: int
    :type inter_op_num_threads: int
    """
    if gpu_device_id is None:
        import onnxruntime

        self.is_using_tensorrt = False
        ort_session_options = onnxruntime.SessionOptions()
        ort_session_options.intra_op_num_threads = intra_op_num_threads
        ort_session_options.inter_op_num_threads = inter_op_num_threads
        self.ort_session = onnxruntime.InferenceSession(
            onnx_path, ort_session_options
        )

    else:
        import onnx
        import onnx_tensorrt.backend as backend

        self.is_using_tensorrt = True
        model_proto = onnx.load(onnx_path)
        for gr_input in model_proto.graph.input:
            gr_input.type.tensor_type.shape.dim[0].dim_value = 1

        self.engine = backend.prepare(
            model_proto, device=f"CUDA:{gpu_device_id}"
        )

ONNXWrapper.run method assumes the use of such a structure:

img = self._process_img_(img)
if self.is_using_tensorrt:
    preds = self.engine.run(img)
else:
    ort_inputs = {self.ort_session.get_inputs()[0].name: img}
    preds = self.ort_session.run(None, ort_inputs)

preds = self._process_preds_(preds)

Inference via onnxruntime on CPU and TensorRT on GPU

Base class TRTWrapper __init__ has the structure as below:

def __init__(
    self,
    onnx_path: Optional[str] = None,
    trt_path: Optional[str] = None,
    gpu_device_id: Optional[int] = None,
    intra_op_num_threads: Optional[int] = 0,
    inter_op_num_threads: Optional[int] = 0,
    fp16_mode: bool = False,
) -> None:
    """
    :param onnx_path: onnx-file path, default = None
    :param trt_path: onnx-file path, default = None
    :param gpu_device_id: gpu device id to use, default = 0
    :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
        to let onnxruntime choose by itself is required 0, default = 0
    :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
        to let onnxruntime choose by itself is required 0, default = 0
    :param fp16_mode: use fp16_mode if class initializes only with
        onnx_path on GPU, default = False
    :type onnx_path: str
    :type trt_path: str
    :type gpu_device_id: int
    :type intra_op_num_threads: int
    :type inter_op_num_threads: int
    :type fp16_mode: bool
    """
    if gpu_device_id is None:
        import onnxruntime

        self.is_using_tensorrt = False
        ort_session_options = onnxruntime.SessionOptions()
        ort_session_options.intra_op_num_threads = intra_op_num_threads
        ort_session_options.inter_op_num_threads = inter_op_num_threads
        self.ort_session = onnxruntime.InferenceSession(
            onnx_path, ort_session_options
        )

    else:
        self.is_using_tensorrt = True
        if trt_path is None:
            builder = TRTEngineBuilder()
            trt_path = builder.build_engine(onnx_path, fp16_mode=fp16_mode)

        self.trt_session = TRTRunWrapper(trt_path)

TRTWrapper.run method assumes the use of such a structure:

img = self._process_img_(img)
if self.is_using_tensorrt:
    preds = self.trt_session.run(img)
else:
    ort_inputs = {self.ort_session.get_inputs()[0].name: img}
    preds = self.ort_session.run(None, ort_inputs)

preds = self._process_preds_(preds)

Environment

TensorRT

TensorRT installing guide is here
Required CUDA-Runtime, CUDA-ToolKit
Also, required additional python packages not included to setup.cfg (it depends upon CUDA environment version):
- pycuda
- nvidia-tensorrt
- nvidia-pyindex

onnx_tensorrt

onnx_tensorrt requires cuda-runtime and tensorrt.

To install:

git clone --depth 1 --branch 21.02 https://github.com/onnx/onnx-tensorrt.git
cd onnx-tensorrt
cp -r onnx_tensorrt /usr/local/lib/python3.8/dist-packages
cd ..
rm -rf onnx-tensorrt

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

Related tags

Overview

PyTorch Infer Utils

To install

Export PyTorch model to ONNX

Export ONNX to TensorRT

Inference via onnxruntime on CPU and onnx_tensort on GPU

Inference via onnxruntime on CPU and TensorRT on GPU

Environment

TensorRT

onnx_tensorrt

Owner

Alex Gorodnitskiy

PAWS 🐾 Predicting View-Assignments with Support Samples

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

A generalist algorithm for cell and nucleus segmentation.

Studying Python release adoptions by looking at PyPI downloads

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

Code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization,

An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

A Runtime method overload decorator which should behave like a compiled language

LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Tensorflow-seq2seq-tutorials - Dynamic seq2seq in TensorFlow, step by step

Deep learning for Engineers - Physics Informed Deep Learning

🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"