[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Last update: Aug 11, 2022

Related tags

Deep Learning CORE-Text

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

This repository is the official PyTorch implementation of CORE-Text, and contains demo training and evaluation scripts.

Requirements

mmdetection == 2.13.0
mmcv == 1.3.5
pyclipper == 1.3.0

Training Demo

Base (Mask R-CNN)

To train Base (Mask R-CNN) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/base.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_base

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

VRM

To train VRM on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/vrm.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_vrm

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

CORE

To train CORE (ours) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

# pre-training
CONFIG=configs/icdar2017mlt/core_pretrain.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core_pretrain

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

# training
CONFIG=configs/icdar2017mlt/core.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

Evaluation Demo

GPUS=4
PORT=${PORT:-29500}
CONFIG=path/to/config
CHECKPOINT=path/to/checkpoint

python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
    ./tools/test.py $CONFIG $CHECKPOINT --launcher pytorch \
    --eval segm \
    --not-encode-mask \
    --eval-options "jsonfile_prefix=path/to/work_dir/results/eval" "gt_path=data/icdar2017mlt/icdar2017mlt_gt.zip"

Dataset Format

The structure of the dataset directory is shown as following, and we provide the COCO-format label (ICDAR2017_train.json and ICDAR2017_val.json) and the ground truth zipfile (icdar2017mlt_gt.zip) for training and evaluation.

data
└── icdar2017mlt
    ├── annotations
    |   ├── ICDAR2017_train.json
    |   └── ICDAR2017_val.json
    ├── icdar2017mlt_gt.zip
    └── image
         ├── train
         └── val

Results

Our model achieves the following performance on ICDAR 2017 MLT val set. Note that the results are slightly different (~0.1%) from what we reported in the paper, because we reimplement the code based on the open-source mmdetection.

Method	Backbone	Training set	Test set	Hmean	Precision	Recall	Download
Base (Mask R-CNN)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.800	0.828	0.773	model \| log
VRM	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.812	0.853	0.774	model \| log
CORE (ours)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.821	0.872	0.777	model \| log

Citation

@inproceedings{9428457,
  author={Lin, Jingyang and Pan, Yingwei and Lai, Rongfeng and Yang, Xuehang and Chao, Hongyang and Yao, Ting},
  booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)},
  title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
  year={2021},
  pages={1-6},
  doi={10.1109/ICME51207.2021.9428457}
}

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Related tags

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Requirements

Training Demo

Base (Mask R-CNN)

VRM

CORE

Evaluation Demo

Dataset Format

Results

Citation

Owner

Jingyang Lin

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

ColossalAI-Benchmark - Performance benchmarking with ColossalAI

TrackTech: Real-time tracking of subjects and objects on multiple cameras

Graduation Project

Earth Vision Foundation

Official Pytorch implementation for video neural representation (NeRV)

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

Measuring Coding Challenge Competence With APPS

Meaningful titles for tabs and PDF downloads! Also supports tab search.

Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

3D mesh stylization driven by a text input in PyTorch

In-place Parallel Super Scalar Samplesort (IPS⁴o)

Implementation for Shape from Polarization for Complex Scenes in the Wild

This is a TensorFlow implementation for C2-Rec

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Kaggle-titanic - A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition. Demonstrates basic data munging, analysis, and visualization techniques. Shows examples of supervised machine learning techniques.

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.