In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Last update: Apr 13, 2022

Overview

Transformers are all you need

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Table of Content

The workshop will be divided into four parts

Introduction to Transformers as a HYPE
Sneak peek to the theory behind Transfomers
Quick tour (Huggingface framework)
Lab
- fine tune a translation model

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

- fine tune a translation model

2. How to get started

Fork this repository
Create a branch by your name
Go through the notebook and complete all tasks
Submit a pull request

Homework exercise

Your task is to fine-tune a classification model

Using HuggingFace transformers and datasets.
fine tune it to one of the classification task of the GLUE Benchmark(CoLa to be specific).
Use a checkpoint from the Hub ("distilbert-base-uncased" for example)
Once finished submit a pull request to this repo, make sure to place your .ipynb file in the submissions folder (YOUR_NAME.ipynb)

Useful ressources : text_classification

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Related tags

Overview

Transformers are all you need

Table of Content

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

2. How to get started

Homework exercise

Owner

Aymen Berriche

chaii - hindi & tamil question answering

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

A NLP program: tokenize method, PoS Tagging with deep learning

Precision Medicine Knowledge Graph (PrimeKG)

自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

Code for the paper "Are Sixteen Heads Really Better than One?"

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Get list of common stop words in various languages in Python

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Ecommerce product title recognition package

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

🦆 Contextually-keyed word vectors

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Deep learning for NLP crash course at ABBYY.

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

Korean stereoypte detector with TUNiB-Electra and K-StereoSet