Course materials for a 3-day seminar "Machine Learning and NLP: Advances and Applications" at New College of Florida

Overview

Machine Learning and NLP: Advances and Applications

This repository hosts the course materials used for a 3-day seminar "Machine Learning and NLP: Advances and Applications" as part of Independent Study Period 2020 at New College of Florida.

Note that the seminar was held in Jan 2020, and the content may be a little bit oudated (as of Feb 2022). Please also refer to a Fall 2021 full semester course "CIS6930 Topics in Computing for Data Science", which covers much wider (and a little bit newer) Deep Learning topics.

Syllabus

Course Description

This 3-day course provides students with an opportunity to learn Machine Learning and Natural Language Processing (NLP) from basics to applications. The course covers some state-of-the-art NLP techniques including Deep Learning. Each day consists of a lecture and a hands-on session to help students learn how to apply those techniques to real-world applications. During the hands-on session, students will be given assignments to develop programming code in Python. Three days are too short to fully understand the concepts that are covered by the course and learn to apply those techniques to actual problems. Students are strongly encouraged to complete reading assignments before the lecture to be ready for the course assignments, and bring a lot of questions to the course. :)

Learning Objectives

Students successfully completing the course will

  • demonstrate the ability to apply machine learning and natural language processing techniques to various types of problems.
  • demonstrate the ability to build their own machine learning models using Python libraries.
  • demonstrate the ability to read and understand research papers in ML and NLP.

Course Outline

  • Wed 1/22 Day 1: Machine Learning basics [Slides]

    • Machine learning examples
    • Problem formulation
    • Evaluation and hyper-parameter tuning
    • Data Processing basics with pandas
    • Machine Learning with scikit-learn
    • Hands-on material: [ipynb] Open In Colab
  • Thu 1/23 Day 2: NLP basics [Slides]

    • Unsupervised learning and visualization
    • Topic models
    • NLP basics with SpaCy and NLTK
    • Understanding NLP pipeline for feature extraction
    • Machine learning for NLP tasks (text classification, sequential tagging)
    • Hands-on material [ipynb] Open In Colab
    • Follow-up
      • Commonsense Reasoning (Winograd Schema Challenge)
  • Fri 1/24 Day 3: Advanced techniques and applications [Slides]

    • Basic Deep Learning techniques
    • Word embeddings
    • Advanced Deep Learning techniques for NLP
    • Problem formulation and applications to (non-)NLP tasks
    • Pre-training models: ELMo and BERT
    • Hands-on material: [ipynb] Open In Colab
    • Follow-up
      • The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
      • Cross-lingual word/sentence embeddings

Reading Assignments & Recommendations:

The following online tutorials for students who are not familiar with the Python libraries used in the course. Each day will have a hands-on session that requires those libraries. Please do not expect to have enough time to learn how to use those libraries during the lecture.

The following list is a good starting point.

The course will cover the following papers as examples of (non-NLP) applications (probably in Day 3.) Students who'd like to learn how to apply Deep Learning techniques to your own problems are encouraged to read the following papers.

  • [1] A. Asai, S. Evensen, B. Golshan, A. Halevy, V. Li, A. Lopatenko, D. Stepanov, Y. Suhara, W.-C. Tan, Y. Xu, "HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments" Proc LREC 18, 2018. [Paper] [Dataset]
  • [2] S. Evensen, Y. Suhara, A. Halevy, V. Li, W.-C. Tan, S. Mumick, "Happiness Entailment: Automating Suggestions for Well-Being," Proc. ACII 2019, 2019. [Paper]
  • [3] Y. Suhara, Y. Xu, A. Pentland, "DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks," Proc. WWW '17, 2017. [Paper]
  • [4] N. Bhutani, Y. Suhara, W.-C. Tan, A. Halevy, H. V. Jagadish, "Open Information Extraction from Question-Answer Pairs," Proc. NAACL-HLT 2019, 2019. [Paper]

Computing Resources:

The course requires students to write code:

  • Students are expected to have a personal computer at their disposal. Students should have a Python interpreter and the listed libraries installed on their machines.

The hands-on sessions will require the following Python libraries. Please install those libraries on your computer prior to the course. See also the reading assignment section for the recommended tutorials.

  • pandas
  • scikit-learn
  • gensim
  • spacy
  • nltk
  • torch (PyTorch)
Owner
Yoshi Suhara
Yoshi Suhara
A simple python project which control paint brush in microsoft paint app

Paint Buddy In Python A simple python project which control paint brush in micro

Ordinary Pythoneer 1 Dec 27, 2021
🍞 Create dynamic spreadsheets with arbitrary layouts using Python

🍞 tartine What this is Installation Usage example Fetching some data Getting started Adding a header Linking more cells Cell formatting API reference

Max Halford 11 Apr 16, 2022
A OBS service to package a published repository into a tar.gz file

OBS Source Service obs-service-publish_tar obs-service-publish_tar will create a archive.tar[.tar compression] archive containing the published repo

Erico Mendonca 1 Feb 16, 2022
Create N Share is a No Code solution which gives users the ability to create any type of feature rich survey forms with ease.

create n share Note : The Project Scaffold will be pushed soon. Create N Share is a No Code solution which gives users the ability to create any type

Chiraag Kakar 11 Dec 03, 2022
Datamol is a python library to work with molecules.

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

datamol 276 Dec 19, 2022
A python package for batch import of resume attachments to be parsed in HrFlow.

HrFlow Importer Description A python package for batch import of resume attachments to be parsed in HrFlow. hrflow-importer is an open-source project

HrFlow.ai (ex: Riminder.net) 3 Nov 15, 2022
A pomodoro app written in Python

Pomodoro It's a pomodoro app written in Python. You can minimize it while you're working if you want to, it'll pop up on your screen when the timer is

Yiğit 1 Dec 20, 2021
Python Freecell Solver

freecell Python Freecell Solver Very early version right now. You can pick a board by changing the file path in freecell.py If you want to play a game

Ben Kaufman 1 Nov 26, 2021
Lightweight and Modern kernel for VK Bots

This is the kernel for creating VK Bots written in Python 3.9

Yrvijo 4 Nov 21, 2021
Compile Binary Ninja's HLIL IR to LLVM, for purposes of compiling it back to a binary again.

Compiles BinaryNinja's HLIL to LLVM Approach Sweep binary for global variables, create them Sweep binary for (used?) external functions, declare those

Kyle Martin 31 Nov 10, 2022
Class and mathematical functions for quaternion numbers.

Quaternions Class and mathematical functions for quaternion numbers. Installation Python This is a Python 3 module. If you don't have Python installed

3 Nov 08, 2022
💉 🔍 VaxFinder - Backend The backend for the Vaccine Hunters Finder tool.

💉 🔍 VaxFinder - Backend The backend for the Vaccine Hunters Finder tool. Development Prerequisites Python 3.8 Poetry: A tool for dependency manageme

Vaccine Hunters Canada 32 Jan 19, 2022
A carrot-based color palette you didn't know you needed.

A package to produce a carrot-inspired color palette for python/matplotlib. Install: pip install carrotColors Update: pip install --upgrade carrotColo

10 Sep 28, 2021
A tool to nowcast quarterly data with monthly indicators: US consumption example

MIDAS_Nowcaster A tool to nowcast quarterly data with monthly indicators: US consumption example Pulls data directly from FRED from a list of codes -

Gene Kindberg-Hanlon 3 Oct 06, 2022
A python program to detect rickrolls with just the youtube link.

rickroll_detector A python program to detect rickrolls with just the youtube link. Usage: clone this repo or download zip run the main.py file with py

Tricky 4 Nov 06, 2022
A Kodi add-on for watching content hosted on PeerTube.

A Kodi add-on for watching content hosted on PeerTube. This add-on is under development so only basic features work, and you're welcome to improve it.

1 Dec 18, 2021
A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

1 Dec 19, 2021
使用京东cookie一键生成所有退会链接

JDMemberCloseLinks 本项目旨在使用京东cookie一键生成所有退会链接

hyzaw 68 Jun 10, 2022
Bad Apple printed out on the console with Python!

bad-apple Bad Apple printed out on the console with Python! Preface A word of disclaimer, while the final code is somewhat original, this project is a

CalvinLoke 186 Dec 01, 2022
Spyware baseado em Python para Windows que registra como atividades da janela em primeiro plano, entradas do teclado.

Spyware baseado em Python para Windows que registra como atividades da janela em primeiro plano, entradas do teclado. Além disso, é capaz de fazer capturas de tela e executar comandos do shell em seg

Tavares 1 Oct 29, 2021