Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Last update: Dec 06, 2022

Related tags

Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook	Description
1.words/EvaluateTokenizationForSentiment	The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization	Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks	Design a better tokenizer for printed books
1.words/Text_Complexity	Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests	Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors	Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries	Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath	Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity	Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings	Explore word embeddings using Gensim
4.embeddings/Semaxis	Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT	Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings	Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering	Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans	Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Overview

anlp21

Owner

David Bamman

Code for the paper "Flexible Generation of Natural Language Deductions"

Code for lyric-section-to-comment generation based on huggingface transformers.

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

AMUSE - financial summarization

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Tools for curating biomedical training data for large-scale language modeling

A python wrapper around the ZPar parser for English.

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

A library for end-to-end learning of embedding index and retrieval model

Long text token classification using LongFormer

NLP topic mdel LDA - Gathered from New York Times website

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Dope Wars game engine on StarkNet L2 roll-up

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

code for modular summarization work published in ACL2021 by Krishna et al

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

This is the source code of RPG (Reward-Randomized Policy Gradient)