小布助手对话短文本语义匹配的一个baseline

Last update: Dec 14, 2022

Related tags

Text Data & NLP oppo-text-match

Overview

oppo-text-match

小布助手对话短文本语义匹配的一个baseline

模型

参考：https://kexue.fm/archives/8213

base版本线下大概0.952，线上0.866（单模型，没做K-flod融合）。

训练

测试环境：tensorflow 1.15 + keras 2.3.1 + bert4keras 0.10.0

跑完100epoch可能6小时左右（3090，建议跑完）

预测

from baseline import *
predict_to_file('result.csv')

然后zip result.zip result.csv，最后把result.zip提交即可。

感谢

感谢主办方对本baseline的肯定～

交流

比赛交流群：QQ群753413531
科学空间交流：QQ群808623966，微信群请加机器人微信号spaces_ac_cn

Owner

苏剑林(Jianlin Su)

科学爱好者

GitHub Repository

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

1 Jan 27, 2022

Search-Engine - 📖 AI based search engine

Search Engine AI based search engine that was trained on 25000 samples, feel free to train on up to 1.2M sample from kaggle dataset, link below StackS

2 Nov 29, 2022

Non-Autoregressive Predictive Coding

Non-Autoregressive Predictive Coding This repository contains the implementation of Non-Autoregressive Predictive Coding (NPC) as described in the pre

43 Nov 15, 2022

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

1.2k Jan 06, 2023

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

2 Sep 27, 2022

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in

241 Jan 04, 2023

Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Pattern-Exploiting Training (PET) This repository contains the code for Exploiting Cloze Questions for Few-Shot Text Classification and Natural Langua

1.4k Dec 30, 2022

Pipelines de datos, 2021.

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip

8 May 19, 2022

CoNLL-English NER Task (NER in English)

CoNLL-English NER Task en | ch Motivation Course Project review the pytorch framework and sequence-labeling task practice using the transformers of Hu

2 Jan 14, 2022

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

End-to-end neural table-text understanding models.

914 Jan 07, 2023

Korean Sentence Embedding Repository

Korean-Sentence-Embedding 🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides

80 Jan 02, 2023

Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

6 Nov 10, 2022

Exploring dimension-reduced embeddings

sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program

91 Nov 29, 2022

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

1 Jan 21, 2022

小布助手对话短文本语义匹配的一个baseline

Related tags

Overview

oppo-text-match

模型

训练

预测

感谢

交流

Owner

苏剑林(Jianlin Su)

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

Search-Engine - 📖 AI based search engine

Non-Autoregressive Predictive Coding

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Mesh TensorFlow: Model Parallelism Made Easier

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Pipelines de datos, 2021.

CoNLL-English NER Task (NER in English)

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Korean Sentence Embedding Repository

Unlimited Call - Text Bombing Tool

Exploring dimension-reduced embeddings

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

Grover is a model for Neural Fake News -- both generation and detectio

Guide to using pre-trained large language models of source code

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

YACLC - Yet Another Chinese Learner Corpus