Crosslingual Segmental Language Model

This repository contains the code from Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages (2021, C.M. Downey, Shannon Drizin, Levon Haroutunian, and Shivin Thukral). The code here is a modified version of the repository from the original MSLM paper. The mslm package can be used to train and use Segmental Language Models.

In this repository, we additionally make available our preparation of the AmericasNLP 2021 multilingual dataset (see Data/AmericasNLP) and the target K'iche' data (Data/GlobalClassroom).

Paper Results

The results from the accompanying paper can be found in the Output directory. *.csv files include statistics from the training run, *.out contain the model output for the entire corpus, *.score contain the segmentation scores of the model output.

The results from the October 2021 pre-print (which we will refer to as Experiment Set A) are reproducible on commit 2b89575. We will consider this the official commit of the October 2021 pre-print.

Usage

The top-level scripts for training and experimentation can be found in RunScripts. Almost all functionality is run through the __main__.py script in the mslm package, which can either train or evaluate/use a model. The PyTorch modules for building SLMs can be found in mslm.segmental_lm, modules for the span-masking Transformer are in mslm.segmental_transformer, and modules for sequence lattice-based computations are in mslm.lattice. The main script takes in a configuration object to set most parameters for model training and use (see mslm.mslm_config). For information on the arguments to the main script:

python -m mslm --help

Environment setup

pip install -r requirements.txt

This code requires Python >= 3.6

Training

./RunScripts/run_mslm.sh

python -m mslm --input_file 
   
     \
    --model_path 
    
      \
    --mode train \
    --config_file 
     
       \
    --dev_file 
      
        \
    [--preexisting]

Evaluation

./RunScripts/eval_mslm.sh

Where is a text file containing all of the words from the training set

Crosslingual Segmental Language Model

Related tags

Overview

Crosslingual Segmental Language Model

Paper Results

Usage

Environment setup

Training

Evaluation

Owner

C.M. Downey

Pretrained Cost Model for Distributed Constraint Optimization Problems

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Code for the RA-L (ICRA) 2021 paper "SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition"

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

Small-bets - Ergodic Experiment With Python

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Contrastive Fact Verification

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Demonstration of transfer of knowledge and generalization with distillation

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

RSNA Intracranial Hemorrhage Detection with python

This repository provides the code for MedViLL(Medical Vision Language Learner).

Library for machine learning stacking generalization.