Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Last update: Dec 15, 2022

Related tags

Deep Learning Language-Identifier

Overview

Language Identifier

What is this ?

The goal of this project is to create a model that is able to predict a given sentence language through text processing, including tokenizing and representation of sentences as vectors and applying concepts such as RNN, LSTM and GRU to create the classifier that can detect the language among 17 languages.

Dataset

Language Detection It's a small language detection dataset. This dataset consists of text details for 17 different languages

Results

All models achieved high accuracy even when using one convolution layer instead of LSTM or GRU, But GRU achieved highest accuracy 99% training accuracy 94% validation accuracy.
Using convlution layer achieved high accuracy about 95% validation accuracy
Using fewer embedding dimensions makes the model reach high accuracy faster but in Embedding Projector alot of words grouped with other languages.

32 Embedding dimensions examples

3 Embedding dimensions examples

GRU Accuracy and Loss

GRU Confusion matrix

Libraries

Tensorflow
Scikit-learn
NumPy
Pandas
Matplotlib

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Related tags

Overview

Language Identifier

What is this ?

Dataset

Results

32 Embedding dimensions examples

3 Embedding dimensions examples

GRU Accuracy and Loss

GRU Confusion matrix

Libraries

Owner

Hossam Asaad

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

This repository contains the code and models for the following paper.

A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

SCAAML is a deep learning framwork dedicated to side-channel attacks run on top of TensorFlow 2.x.

Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Official implementations of PSENet, PAN and PAN++.

This repository provides code for "On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness".

Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

Veri Setinizi Yolov5 Formatına Dönüştürün

This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.