(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Last update: Jul 01, 2022

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.

News

03/04/2022 Trained model and processed testing data released.
03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

Goal

News

Experiments

Step 1: Download Code, Model & Data

Step 2: Setup Environment

Step 3. Offline Computing for Chunk Embeddings

Step 4. Generating Grounded Summary

License

Disclaimer

Owner

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

iBOT: Image BERT Pre-Training with Online Tokenizer

Scene Text Retrieval via Joint Text Detection and Similarity Learning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

A programming language with logic of Python, and syntax of all languages.

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Topic Modelling for Humans

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Machine learning classifiers to predict American Sign Language .

ChatBotProyect - This is an unfinished project about a simple chatbot.

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

Count the frequency of letters or words in a text file and show a graph.

A natural language processing model for sequential sentence classification in medical abstracts.

SimBERT升级版（SimBERTv2）！

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

kochat

Open source code for AlphaFold.

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)