Deep learning for NLP crash course at ABBYY.

Last update: Dec 18, 2022

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version:

Updated English version:

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version:

Updated English version:

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version:

Updated English version:

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version:

Updated English version:

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version:

Updated English version:

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version:

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version:

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version:

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version:

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version:

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version:

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version:

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version:

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version:

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

Deep learning for NLP crash course at ABBYY.

Related tags

Overview

Deep NLP Course at ABBYY

Materials

Week 1: Introduction

Week 2: Word Embeddings: Part 1

Week 3: Word Embeddings: Part 2

Week 4: Convolutional Neural Networks

Week 5: RNNs: Part 1

Week 6: RNNs: Part 2

Week 7: Language Models: Part 1

Week 8: Language Models: Part 2

Week 9: Seq2seq

Week 10: Seq2seq with Attention

Week 11: Transformers & Text Summarization

Week 12: Dialogue Systems: Part 1

Week 13: Dialogue Systems: Part 2

Week 14: Pretrained Models

Final Presentation

Owner

Dan Anastasyev

Active learning for text classification in Python

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

COVID-19 Related NLP Papers

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

A CSRankings-like index for speech researchers

Installation, test and evaluation of Scribosermo speech-to-text engine

Spert NLP Relation Extraction API deployed with torchserve for inference

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Mastering Transformers, published by Packt

Fine-tune GPT-3 with a Google Chat conversation history

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

A python wrapper around the ZPar parser for English.

RecipeReduce: Simplified Recipe Processing for Lazy Programmers

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

SpikeX - SpaCy Pipes for Knowledge Extraction

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform