Mengzi Pretrained Models

Overview

中文 | English

Mengzi

尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。

我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

基于语言学信息融入和训练加速等方法,我们研发了 Mengzi 系列模型。由于与 BERT 保持一致的模型结构,Mengzi 模型可以快速替换现有的预训练模型。

详细的技术报告请参考:

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

导航

快速上手

Mengzi-BERT

# 使用 Huggingface transformers 加载
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

Mengzi-T5

# 使用 Huggingface transformers 加载
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

Mengzi-Oscar

参考文档

依赖安装

pip install transformers

下游任务

CLUE 分数

Model AFQMC TNEWS IFLYTEK CMNLI WSC CSL CMRC2018 C3 CHID
RoBERTa-wwm-ext 74.30 57.51 60.80 80.70 67.20 80.67 77.59 67.06 83.78
Mengzi-BERT-base 74.58 57.97 60.68 82.12 87.50 85.40 78.54 71.70 84.16

RoBERTa-wwm-ext 的分数来自 CLUE baseline

对应超参

Task Learning rate Batch size Epochs
AFQMC 3e-5 32 10
TNEWS 3e-5 128 10
IFLYTEK 3e-5 64 10
CMNLI 3e-5 512 10
WSC 8e-6 64 50
CSL 5e-5 128 5
CMRC2018 5e-5 8 5
C3 1e-4 240 3
CHID 5e-5 256 5

下载链接

联系方式

微信讨论群

邮箱

wangyulong[at]chuangxin[dot]com

免责声明

该项目中的内容仅供技术研究参考,不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型,但我们不对因使用该项目内容造成的直接或间接损失负责。技术报告中所呈现的实验结果仅表明在特定数据集和超参组合下的表现,并不能代表各个模型的本质。 实验结果可能因随机数种子,计算设备而发生改变。

使用者以各种方式使用本模型(包括但不限于修改使用、直接使用、通过第三方使用)的过程中,不得以任何方式利用本模型直接或间接从事违反所属法域的法律法规、以及社会公德的行为。使用者需对自身行为负责,因使用本模型引发的一切纠纷,由使用者自行承担全部法律及连带责任。我们不承担任何法律及连带责任。

我们拥有对本免责声明的解释、修改及更新权。

文献引用

@misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Owner
Langboat
Langboat
Multi Agent Path Finding Algorithms

MATP-solver Simulator collision check path step random initial states or given states Traditional method Seperate A* algorithem Confict-based Search S

30 Dec 12, 2022
Yolov5 + Deep Sort with PyTorch

딥소트 수정중 Yolov5 + Deep Sort with PyTorch Introduction This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of obj

1 Nov 26, 2021
Predicting future trajectories of people in cameras of novel scenarios and views.

Pedestrian Trajectory Prediction Predicting future trajectories of pedestrians in cameras of novel scenarios and views. This repository contains the c

8 Sep 03, 2022
This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability.

Delayed-cellular-neural-network This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability. There is als

4 Apr 28, 2022
PyTorch implementation for paper Neural Marching Cubes.

NMC PyTorch implementation for paper Neural Marching Cubes, Zhiqin Chen, Hao Zhang. Paper | Supplementary Material (to be updated) Citation If you fin

Zhiqin Chen 109 Dec 27, 2022
A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints A Python package for generating concise, high-quality summaries of a probability distribution GoodPoints is a collection of tools for compr

Microsoft 28 Oct 10, 2022
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Gabriele Corso 56 Dec 23, 2022
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cr

Scaleout 75 Nov 09, 2022
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?” Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
OMLT: Optimization and Machine Learning Toolkit

OMLT is a Python package for representing machine learning models (neural networks and gradient-boosted trees) within the Pyomo optimization environment.

C⚙G - Imperial College London 179 Jan 02, 2023
Source code for "Interactive All-Hex Meshing via Cuboid Decomposition [SIGGRAPH Asia 2021]".

Interactive All-Hex Meshing via Cuboid Decomposition Video demonstration This repository contains an interactive software to the PolyCube-based hex-me

Lingxiao Li 131 Dec 05, 2022
A blender add-on that automatically re-aligns wrong axis objects.

Auto Align A blender add-on that automatically re-aligns wrong axis objects. Usage There are three options available in the 3D Viewport Sidebar It

29 Nov 25, 2022
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 09, 2022
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

Chang-Shu Chung 1.3k Jan 07, 2023
Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification

PPML-TSA This repository provides all code necessary to reproduce the results reported in our paper Evaluating Privacy-Preserving Machine Learning in

Dominik 1 Mar 08, 2022
Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

Jooyoung Choi 88 Dec 01, 2022
Harmonic Memory Networks for Graph Completion

HMemNetworks Code and documentation for Harmonic Memory Networks, a series of models for compositionally assembling representations of graph elements

mlalisse 0 Oct 27, 2021
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"

ZHANG HAO 42 Dec 29, 2022