2021搜狐校园文本匹配算法大赛baseline

Last update: Sep 06, 2022

Related tags

Text Data & NLP sohu2021-baseline

Overview

sohu2021-baseline

2021搜狐校园文本匹配算法大赛baseline

简介

分享了一个搜狐文本匹配的baseline，主要是通过条件LayerNorm来增加模型的多样性，以实现同一模型处理不同类型的数据、形成不同输出的目的。

线下验证集F1约0.74，线上测试集F1约0.73。预训练模型是RoFormer，也欢迎对比其他预训练模型的效果。

测试环境：tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5，如果在其他环境组合下报错，请根据错误信息自行调整代码。

详情请看：https://kexue.fm/archives/8337

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

Owner

苏剑林(Jianlin Su)

科学爱好者

GitHub Repository

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

3 Apr 02, 2022

Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉 What is this? At this repo, I'm

13 Oct 10, 2022

A list of NLP(Natural Language Processing) tutorials

NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and

1.3k Dec 25, 2022

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

2 Mar 17, 2022

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in

241 Jan 04, 2023

Transformer related optimization, including BERT, GPT

This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.

1.7k Jan 04, 2023

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

220 Dec 11, 2022

Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

92 Dec 25, 2022

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

Datasets of Automatic Keyphrase Extraction

This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. Following are the datasets and the original papers that proposed them. If yo

163 Dec 23, 2022

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Regression Free Model Update Code for the paper: Regression Bugs Are In Your Mod

2 Feb 17, 2022

2021搜狐校园文本匹配算法大赛baseline

Related tags

Overview

sohu2021-baseline

简介

交流

Owner

苏剑林(Jianlin Su)

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Question and answer retrieval in Turkish with BERT

A list of NLP(Natural Language Processing) tutorials

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Transformer related optimization, including BERT, GPT

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Implementation of ProteinBERT in Pytorch

Text vectorization tool to outperform TFIDF for classification tasks

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Datasets of Automatic Keyphrase Extraction

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

A PyTorch Implementation of End-to-End Models for Speech-to-Text

This is a NLP based project to extract effective date of the contract from their text files.

code for modular summarization work published in ACL2021 by Krishna et al

Quick insights from Zoom meeting transcripts using Graph + NLP

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

Lyrics generation with GPT2-based Transformer