Chinese segmentation library

Last update: Jun 28, 2022

Related tags

Overview

What is loso?

loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ([email protected]) for Plurk Inc.

Copyright & Licnese

Setup loso

To install loso, clone the repo and run following command

cd loso
python setup.py develop

Also, you need to run a redis database for storing the lexicon database. Also, you need to copy configuration template and modify it.

cp default.yaml myconf.yaml
vim myconf.yaml

To use your configuration, you have to set the configuration environment variable LOSO_CONFIG_FILE. For example:

LOSO_CONFIG_FILE=myconfig.yaml python setup.py server

Use loso

Loso determines segmentation according to the lexicon database, and the algorithm is based on Hidden Makov Model, therefore, it is not possible to use the service before building a lexicon database.

To feed a text file to the database, here you can run

python setup.py feed -f /home/victorlin/plurk_src/realtime_search/word_segment/sample_data/sample_tr_ch

To clean the database, you can run

python setup.py reset

To interact and test for splitting terms, here you can run

python setup.py interact

For example

Text: 留下鉅細靡遺的太空梭發射影片，供世人回味
....
留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味

To use the segmentation service as XMLRPC service, here you can run

python setup.py serve

Following is a simple Python program for showing how to use it

import xmlrpclib

proxy = xmlrpclib.ServerProxy("http://localhost:5566/")

terms = proxy.splitTerms(u'留下鉅細靡遺的太空梭發射影片，供世人回味')
print ' '.join(terms)

And the output should be

留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味

Chinese segmentation library

Related tags

Overview

What is loso?

Copyright & Licnese

Setup loso

Use loso

Owner

Fang-Pen Lin

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

NLP command-line assistant powered by OpenAI

FastFormers - highly efficient transformer models for NLU

A natural language modeling framework based on PyTorch

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Official Stanford NLP Python Library for Many Human Languages

Chinese segmentation library

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

null

Application for shadowing Chinese.

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3