The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Large-scale Knowledge Graph Construction with Prompting

Klexikon: A German Dataset for Joint Summarization and Simplification

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Auto translate textbox from Japanese to English or Indonesia

Conversational text Analysis using various NLP techniques

Natural language Understanding Toolkit

Deep Learning Topics with Computer Vision & NLP

An A-SOUL Text Generator Based on CPM-Distill.

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Russian words synonyms and antonyms

Repositório da disciplina no semestre 2021-2

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.