AMUSE

AMUSE - financial summarization

Unzip data.zip

Train new model:

python FinAnalyze.py --task train --start 0 --count --modelpath data/models/new_model.h5 --train data/train --gold data/gold

data/train = dir where the text files are data/gold = dir where the gold summaries are

Trains new AMUSE prediction model for given files and stores it in an .h5 file

Generate summaries with existing model:

python FinAnalyze.py --task generate-summaries --start 0 --count --modelpath data/models/new_model.h5 --test data/test/ --summarydir data/summaries

Also stored:

a model trained on 3000 files named model.training.muse.3000.all.h5

If you use this code, please cite:

Litvak M, Vanetik N. Summarization of financial reports with AMUSE. In Proceedings of the 3rd Financial Narrative Processing Workshop 2021 (pp. 31-36).

@inproceedings{litvak2021summarization, title={Summarization of financial reports with AMUSE}, author={Litvak, Marina and Vanetik, Natalia}, booktitle={Proceedings of the 3rd Financial Narrative Processing Workshop}, pages={31--36}, year={2021} }

AMUSE - financial summarization

Related tags

Overview

AMUSE

Owner

Python implementation of TextRank for phrase extraction and summarization of text documents

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

A python framework to transform natural language questions to queries in a database query language.

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

The SVO-Probes Dataset for Verb Understanding

Contains descriptions and code of the mini-projects developed in various programming languages

SciBERT is a BERT model trained on scientific text.

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

숭실대학교 컴퓨터학부 전공종합설계프로젝트

Understanding the Difficulty of Training Transformers

A unified tokenization tool for Images, Chinese and English.

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Exploring dimension-reduced embeddings

Unlimited Call - Text Bombing Tool

HuggingTweets - Train a model to generate tweets

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

Simple GUI where you can enter an article and get a crisp summarized version.

scikit-learn wrappers for Python fastText.

Train BPE with fastBPE, and load to Huggingface Tokenizer.