The tutorial is a collection of many other resources and my own notes

Overview
# TOC

Before reading
the tutorial is a collection of many other resources and my own notes. Note that the ref if any in the tutorial means the whole passage. And part to be referred if any means the part has been summarized or detailed by me. Feel free to click the [the part to be referred] to read the original.

CTC_pytorch

1. Why we need CTC? ---> looking back on history

Feel free to skip it if you already know the purpose of CTC coming into being.

1.1. About CRNN

We need to learn CRNN because in the context we need an output to be a sequence.

ref: the overview from CRNN to CTC !! highly recommended !!

part to be referred

multi-digit sequence recognition

  • Characted-based
  • word-based
  • sequence-to-sequence
  • CRNN = CNN + RNN
    • CNN --> relationship between pixel
    • (the small fonts) Specifially, each feature vec of a feature seq is generated from left to right on the feature maps. That means the i-th feature vec is the concatenation of the columns of all the maps. So the shape of the tensor can be reshaped as e.g. (batch_size, 32, 256)

image1



1.2. from Cross Entropy Loss to CTC Loss

Usually, CE is applied to compute loss as the following way. And gt(also target) can be encoded as a stable matrix or vector.

image2

However, in OCR or audio recognition, each target input/gt has various forms. e.g. "I like to play piano" can be unpredictable in handwriting.

image3

Some stroke is longer than expected. Others are short.
Assume that the above example is encoded as number sequence [5, 3, 8, 3, 0].

image4

  • Tips: blank(the blue box symbol here) is introduced because we allow the model to predict a blank label due to unsureness or the end comes, which is similar with human when we are not pretty sure to make a good prediction. ref:lihongyi lecture starting from 3:45

Therefore, we see that this is an one-to-many question where e.g. "I like to play piano" has many target forms. But we not just have one sequence. We might also have other sequence e.g. "I love you", "Not only you but also I like apple" etc, none of which have a same sentence length. And this is what cross entropy cannot achieve in one batch. But now we can encode all sequences/sentences into a new sequence with a max length of all sequences.

e.g.
"I love you" --> len = 10
"How are you" --> len = 11
"what's your name" --> len = 16

In this context the input_length should be >= 16.

For dealing with the expanded targets, CTC is introduced by using the ideas of (1) HMM forward algorithm and (2) dynamic programing.

2. Details about CTC

2.1. intuition: forward algorithm

image5

image6

Tips: the reason we have - inserted between each two token is because, for each moment/horizontal(Note) position we allow the model to predict a blank representing unsureness.

Note that moment is for audio recognition analogue. horizontal position is for OCR analogue.



2.2. implementation: forward algorithm with dynamic programming

the complete code is CTC.py

given 3 samples, they are
"orange" :[15, 18, 1, 14, 7, 5]    len = 6
"apple" :[1, 16, 16, 12, 5]    len = 5
"watermelon" :[[23, 1, 20, 5, 18, 13, 5, 12, 15, 14]  len = 10

{0:blank, 1:A, 2:B, ... 26:Z}

2.2.1. dummy input ---> what the input looks like

# ------------ a dummy input ----------------
log_probs = torch.randn(15, 3, 27).log_softmax(2).detach().requires_grad_()# 15:input_length  3:batchsize  27:num of token(class)
# targets = torch.randint(0, 27, (3, 10), dtype=torch.long)
targets = torch.tensor([[15, 18, 1,  14, 7, 5,  0, 0,  0,  0],
                        [1,  16, 16, 12, 5, 0,  0, 0,  0,  0],
                        [23, 1,  20, 5, 18, 13, 5, 12, 15, 14]]
                        )

# assume that the prediction vary within 15 input_length.But the target length is still the true length.
""" 
e.g. [a,0,0,0,p,0,p,p,p, ...l,e] is one of the prediction
 """
input_lengths = torch.full((3,), 15, dtype=torch.long)
target_lengths = torch.tensor([6,5,10], dtype = torch.long)



2.2.2. expand the target ---> what the target matrix look like

Recall that one target can be encoded in many different forms. So we introduce a targets mat to represent it as follows.

"-d-o-g-" ">
target_prime = targets.new_full((2 * target_length + 1,), blank) # create a targets_prime full of zero

target_prime[1::2] = targets[i, :target_length] # equivalent to insert blanks in targets. e.g. targets = "dog" --> "-d-o-g-"

Now we got target_prime(also expanded target) for e.g. "apple"
target_prime is
tensor([ 0, 1, 0, 16, 0, 16, 0, 12, 0, 5, 0]) which is visualized as the red part(also t1)

image7

Note that the t8 is only for illustration. In the example, the width of target matrix should be 15(input_length).

probs = log_probs[:input_length, i].exp()

Then we convert original inputs from log-space like this, referring to "In practice, the above recursion ..." in original paper https://www.cs.toronto.edu/~graves/icml_2006.pdf

2.3. Alpha Matrix

image8

# alpha matrix init at t1 indicated by purple boxes.
alpha_col = log_probs.new_zeros((target_length * 2 + 1,))
alpha_col[0] = probs[0, blank] # refers to green box
alpha_col[1] = probs[0, target_prime[1]]
  • blank is the index of blank(here it's 0)
  • target_prime[1] refers to the 1-st index of the token. e.g. "apple": "a", "orange": "o"

2.4. Dynamic programming based on 3 conditions

refer to the details in CTC.py

reference:

Owner
手写AI
手写AI
JMESPath is a query language for JSON.

JMESPath JMESPath (pronounced "james path") allows you to declaratively specify how to extract elements from a JSON document. For example, given this

1.7k Dec 31, 2022
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.

Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi

Vinay Sajip 14 Dec 18, 2022
An interview engine for businesses, interview those who are actually qualified and are worth your time!

easyInterview V0.8B An interview engine for businesses, interview those who are actually qualified and are worth your time! Quick Overview You/the com

Vatsal Shukla 1 Nov 19, 2021
swagger-codegen contains a template-driven engine to generate documentation, API clients and server stubs in different languages by parsing your OpenAPI / Swagger definition.

Master (2.4.25-SNAPSHOT): 3.0.31-SNAPSHOT: Maven Central ⭐ ⭐ ⭐ If you would like to contribute, please refer to guidelines and a list of open tasks. ⭐

Swagger 15.2k Dec 31, 2022
Python Tool to Easily Generate Multiple Documents

Python Tool to Easily Generate Multiple Documents Running the script doesn't require internet Max Generation is set to 10k to avoid lagging/crashing R

2 Apr 27, 2022
An ongoing curated list of OS X best applications, libraries, frameworks and tools to help developers set up their macOS Laptop.

macOS Development Setup Welcome to MacOS Local Development & Setup. An ongoing curated list of OS X best applications, libraries, frameworks and tools

Paul Veillard 3 Apr 03, 2022
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your

BDFD 6 Nov 05, 2022
Loudchecker - Python script to check files for earrape

loudchecker python script to check files for earrape automatically installs depe

1 Jan 22, 2022
Python-slp - Side Ledger Protocol With Python

Side Ledger Protocol Run python-slp node First install Mongo DB and run the mong

Solar 3 Mar 02, 2022
Documentation of the QR code found on new Austrian ID cards.

Austrian ID Card QR Code This document aims to be a complete documentation of the format used in the QR area on the back of new Austrian ID cards (Per

Gabriel Huber 9 Dec 12, 2022
A curated list of awesome tools for Sphinx Python Documentation Generator

Awesome Sphinx (Python Documentation Generator) A curated list of awesome extra libraries, software and resources for Sphinx (Python Documentation Gen

Hyunjun Kim 831 Dec 27, 2022
Valentine-with-Python - A Python program generates an animation of a heart with cool texts of your loved one

Valentine with Python Valentines with Python is a mini fun project I have coded.

Niraj Tiwari 4 Dec 31, 2022
Easy OpenAPI specs and Swagger UI for your Flask API

Flasgger Easy Swagger UI for your Flask API Flasgger is a Flask extension to extract OpenAPI-Specification from all Flask views registered in your API

Flasgger 3.1k Dec 24, 2022
The blazing-fast Discord bot.

Wavy Wavy is an open-source multipurpose Discord bot built with pycord. Wavy is still in development, so use it at your own risk. Tools and services u

Wavy 7 Dec 27, 2022
Proyecto - Desgaste y rendimiento de empleados de IBM HR Analytics

Acceder al código desde Google Colab para poder ver de manera adecuada todas las visualizaciones y poder interactuar con ellas. Links de acceso: Noteb

1 Jan 31, 2022
This program has been coded to allow the user to rename all the files in the entered folder.

Bulk_File_Renamer This program has been coded to allow the user to rename all the files in the entered folder. The only required package is "termcolor

1 Jan 06, 2022
Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

Adrian Samoticha 2 Nov 23, 2021
Fastest Git client for Emacs.

EAF Git Client EAF Git is git client application for the Emacs Application Framework. The advantages of EAF Git are: Large log browse: support 1 milli

Emacs Application Framework 31 Dec 02, 2022
Explorative Data Analysis Guidelines

Explorative Data Analysis Get data into a usable format! Find out if the following predictive modeling phase will be successful! Combine everything in

Florian Rohrer 18 Dec 26, 2022
Second version of SQL-PYTHON-Practicas

SQLite-Python Acerca de | Autor Sobre el repositorio Segunda version de SQL-PYTHON-Practicas 💻 Tecnologias Visual Studio Code Python SQLite3 📖 Requi

1 Jan 06, 2022