Translate .sbv subtitle files

Last update: Oct 20, 2021

Related tags

Text Processing deepl4subtitle

Overview

deepl4subtitle

Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。

つかいかた

入力する.sbvファイルの前処理として、文の終わりにピリオド(.)を打っていく。これで、Deeplが文の区切りを正しく認識してくれる。

# install deepl 
# https://pypi.org/project/deepl/
pip3 install deepl
python3 deepl4subtitle.py -i sample.sbv -o output.sbv -k YOUR_DEEPL_API_KEY

サンプル

sample video: https://www.youtube.com/watch?v=CL7HuMLIPO0

sample.xbv: Youtubeが自動で生成した字幕を若干手直ししたもの
sample_deepl4subtitle.sbv: deepl4subtitleを使って翻訳したもの
sample_raw_deepl.sbv: sample.xbvの中身をそのままDeeplにコピペして翻訳したもの

sample_raw_deeplだと、タイムスタンプが文章の一部であることが原因であちこちで怪しい翻訳が発生していたのが、sample_deepl4subtitleでは概ね解消されている。

中でやってること

original

(文末のピリオドは手作業で加える必要がある)

0:00:01.340,0:00:04.780
クラウドコンピューティングという言葉を
知っているだろうか.

0:00:04.780,0:00:08.110
クラウドコンピューティングとは
インターネットの先にあるデータセンター

0:00:08.110,0:00:12.420
のサーバーに処理してもらうシステム形態
を指す言葉である.

↓ move timestamp within XML tag, remove newlines

クラウドコンピューティングという言葉を知っているだろうか.クラウドコンピューティングとはインターネットの先にあるデータセンターのサーバーに処理してもらうシステム形態を指す言葉である. ">

<timestamp ts="0:00:01.340,0:00:04.780"/>クラウドコンピューティングという言葉を知っているだろうか.<timestamp ts="0:00:04.780,0:00:08.110"/>クラウドコンピューティングとはインターネットの先にあるデータセンター<timestamp ts="0:00:08.110,0:00:12.420"/>のサーバーに処理してもらうシステム形態を指す言葉である.

↓ translate with Deepl through API, ignoring XML tags

Do you know the term "cloud computing"? Cloud computing is a term that refers to a form of system that is processed by servers in a data center located beyond the Internet. ">

<timestamp ts="0:00:01.340,0:00:04.780"/>Do you know the term "cloud computing"? <timestamp ts="0:00:04.780,0:00:08.110"/> Cloud computing is a term that refers to a form of system that is processed by servers in a data center <timestamp ts="0:00:08.110,0:00:12.420"/>located beyond the Internet.

↓ put back timestamp and newlines

0:00:01.340,0:00:04.780
Do you know the term "cloud computing"? 

0:00:04.780,0:00:08.110
 Cloud computing is a term that refers to a form of system that is processed by servers in a data center 

0:00:08.110,0:00:12.420
located beyond the Internet.

Translate .sbv subtitle files

Related tags

Overview

deepl4subtitle

つかいかた

サンプル

中でやってること

original

↓ move timestamp within XML tag, remove newlines

↓ translate with Deepl through API, ignoring XML tags

↓ put back timestamp and newlines

Owner

Yasunori Toshimitsu

RSS Reader application for the Emacs Application Framework.

A neat little program to read the text from the "All Ten Fingers" program, and write them back.

This script has been created in order to find what are the most common demanded technologies in Data Engineering field.

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Production First and Production Ready End-to-End Keyword Spotting Toolkit

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

Little python script + dictionary to help solve Wordle puzzles

Repositori untuk belajar pemrograman Python dalam bahasa Indonesia

A simple Python module for parsing human names into their individual components

A production-ready pipeline for text mining and subject indexing

A Python package to facilitate research on building and evaluating automated scoring models.

Fixes mojibake and other glitches in Unicode text, after the fact.

A generator library for concise, unambiguous and URL-safe UUIDs.

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

A program that looks through entered text and replaces certain commands with mathematical symbols

Add your new words to a text file and get them randomly.

box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Python port of Google's libphonenumber

一个可以可以统计群组用户发言，并且能将聊天内容生成词云的机器人

py-trans is a Free Python library for translate text into different languages.