端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Last update: Jan 08, 2023

Related tags

Text Data & NLP SPACES

Overview

SPACES

端到端的长文本摘要模型（法研杯2020司法摘要赛道）。

博客介绍：https://kexue.fm/archives/8046

含义

我们将我们的模型称为SPACES，它正好是科学空间的域名之一（https://spaces.ac.cn），具体含义如下：

S：Sparse Softmax；
P：Pretrained Language Model；
A：Abstractive；
C：Copy Mechanism；
E：Extractive；
S：Special Words。

顾名思义，这是一个以词为单位的、包含预训练和Copy机制的“抽取-生成”式摘要模型，里边包含了一些我们对文本生成技术的最新研究成果。

运行

实验环境：tensorflow 1.14 + keras 2.3.1 + bert4keras 0.9.7

(如果是Windows，请用bert4keras>=0.9.8)

首先请在snippets.py中修改相关路径配置，然后再执行下述代码。

训练代码：

#! /bin/bash

python extract_convert.py
python extract_vectorize.py

for ((i=0; i<15; i++));
    do
        python extract_model.py $i
    done

python seq2seq_convert.py
python seq2seq_model.py

预测代码

from final import *
summary = predict(text, topk=3)
print(summary)

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

链接

博客：https://kexue.fm
追一：https://zhuiyi.ai/
预训练模型：https://github.com/ZhuiyiTechnology/pretrained-models
WoBERT：https://github.com/ZhuiyiTechnology/WoBERT

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Related tags

Overview

SPACES

含义

运行

交流

链接

Owner

苏剑林(Jianlin Su)

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Stand-alone language identification system

Ask for weather information like a human

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

AI-powered literature discovery and review engine for medical/scientific papers

Open-source offline translation library written in Python. Uses OpenNMT for translations

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

What are the best Systems? New Perspectives on NLP Benchmarking

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

jiant is an NLP toolkit

ACL'2021: Learning Dense Representations of Phrases at Scale

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

AutoGluon: AutoML for Text, Image, and Tabular Data

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

The Classical Language Toolkit

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

To be a next-generation DL-based phenotype prediction from genome mutations.