Multi-Modal Machine Learning toolkit based on PyTorch.

Last update: Jan 05, 2022

Related tags

Deep Learning TorchMM

Overview

简体中文 | English

TorchMM

简介

多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。

近期更新

2022.1.5 发布 TorchMM 初始版本 v1.0

特性

丰富的任务场景：工具包提供多模态融合、跨模态检索、图文生成等多种多模态学习任务算法模型库，支持用户自定义数据和训练。
成功的落地实践：基于工具包算法已有相关落地应用，如球鞋真伪鉴定、球鞋风格迁移、家具图片自动描述、舆情监控等。

应用展示

球鞋真伪鉴定

更多信息欢迎访问我们的网站 Ysneaker ！

框架

TorchMM 包括以下模块：

数据处理：提供统一的数据接口和多种数据处理格式
模型库：包括多模态融合、跨模态检索、图文生成、多任务算法
训练器：对每种任务设置统一的训练流程和相关指标计算

使用

下载工具包

git clone https://github.com/njustkmg/TorchMM.git

使用示例：

from torchmm import TorchMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# gpu: Which gpu to use

runner = PaddleMM(config='configs/cmml.yml',
                  data_root='data/COCO', 
                  image_root='data/COCO/images', 
                  cuda=0)

或者

python run.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --cuda 0

模型库 (更新中)

[1] Comprehensive Semi-Supervised Multi-Modal Learning

[2] Stacked Cross Attention for Image-Text Matching

[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[5] Attention on Attention for Image Captioning

[6] VQA: Visual Question Answering

[7] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

实验结果

多模态融合

	Average_Precision	Coverage	Example_AUC	Macro_AUC	Micro_AUC	Ranking_loss
CMML	0.682	18.827	0.948	0.927	0.950	0.052
Early(add)							ResNet+LSTM
Early(concat)							ResNet+GRU

许可证书

本项目的发布受 Apache 2.0 license 许可认证。

Multi-Modal Machine Learning toolkit based on PyTorch.

Related tags

Overview

TorchMM

简介

近期更新

特性

应用展示

框架

使用

模型库 (更新中)

实验结果

许可证书

Owner

njustkmg

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Medical Image Segmentation using Squeeze-and-Expansion Transformers

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance"

VisionKG: Vision Knowledge Graph

tsflex - feature-extraction benchmarking

Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Compact Bidirectional Transformer for Image Captioning

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

List of awesome things around semantic segmentation 🎉

PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images

A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

PyTorch implementation for STIN

Repository for open research on optimizers.

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Wikidated : An Evolving Knowledge Graph Dataset of Wikidata’s Revision History

Python wrapper of LSODA (solving ODEs) which can be called from within numba functions.