Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

Last update: Oct 08, 2022

Related tags

Computer Vision CSCBLI

Overview

CSCBLI

Code for our ACL Findings 2021 paper,
"Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction".

Requirements

python >= 3.6
numpy >= 1.9.0
pytorch >= 1.0

Supervised

How to train

CUDA_VISIBLE_DEVICES=0 python train.py --src_lang $lg --tgt_lang en\
        --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
        --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
        --train_data_path $data_path --save_path $save_path

--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

CUDA_VISIBLE_DEVICES=0 python test_on_all_word.py --src_lang $lg\
        --tgt_lang en --model_path $model_path\
        --dict_path $dict_path\
        --vecmap_context_src_emb_path $vcpath\
        --vecmap_context_tgt_emb_path $vspath\
        --vecmap

--vecmap_context_src_emb_path aligned source context embedding path
--vecmap_context_tgt_emb_path aligned target context embedding path
--vecmap use interpolation method, else unified method

Unsupervised

How to train

lg=ar
CUDA_VISIBLE_DEVICES=0 python train.py --src_lang en --tgt_lang $lg\
  --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
  --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
   --save_path $save_path

--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

src=ar
tgt=en
model_path=../checkpoints/$src-$tgt-add_orign_nw.pkl_last
CUDA_VISIBLE_DEVICES=0 python test.py  --model_path $model_path \
        --dict_path ../$src-$tgt.5000-6500.txt  --mode v2 \
        --src_lang $src --tgt_lang $tgt  \
        --reload_src_ctx   $path1 \
        --reload_tgt_ctx   $path2 --lambda_w1 0.11

--mode type    use v1 for unified method and v2 for interpolated 
--lambda_w1    the weight for interpolation
--reload_src_ctx   aligned source context embedding
--reload_tgt_ctx   aligned targte context embedding

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

Related tags

Overview

CSCBLI

Requirements

Supervised

How to train

How to Test

Unsupervised

How to train

How to Test

Owner

Jinpeng Zhang

An OCR evaluation tool

Textboxes : Image Text Detection Model : python package (tensorflow)

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Optical character recognition for Japanese text, with the main focus being Japanese manga

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

Histogram specification using openCV in python .

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

ARU-Net - Deep Learning Chinese Word Segment

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

A toolbox of scene text detection and recognition

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Some bits of javascript to transcribe scanned pages using PageXML

⛓ marc is a small, but flexible Markov chain generator

Multi-choice answer sheet correction system using computer vision with opencv & python.

Python package for handwriting and sketching in Jupyter cells

Optical character recognition for Japanese text, with the main focus being Japanese manga

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition