Text language identification using Wikipedia data

The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.

Usage

The main function is text-langs that returns 2 values:

a lang - probability alist (languages are represented by their ISO-639-1 codes)
a vector of tokens with their inferred langs

WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)

Running as a service

Installation

Install SBCL
Get Quicklisp
Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp

Running as a Docker

docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest

curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}"  http://localhost:5000/detect | jq '.'

Or you can use prebuilt Docker image maintained outside of this repository.

docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest

API

See swagger definition

Text language identification using Wikipedia data

Related tags

Overview

Text language identification using Wikipedia data

Usage

Running as a service

Installation

Running as a Docker

API

Helpful links:

Owner

Vsevolod Dyomkin

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Ackermann Line Follower Robot Simulation.

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Face Anonymizer - FaceAnonApp v1.0

A novel region proposal network for more general object detection ( including scene text detection ).

Python library to extract tabular data from images and scanned PDFs

WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Make OpenCV camera loops less of a chore by skipping the boilerplate and getting right to the interesting stuff

PianoVisuals - Create background videos synced with piano music using opencv

Let's explore how we can extract text from forms

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

A real-time dolly zoom camera effect

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

DouZero is a reinforcement learning framework for DouDizhu - 斗地主AI

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"