text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Overview

text-detection-ctpn

Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be found here. Also, the origin repo in caffe can be found in here. For more detail about the paper and code, see this blog. If you got any questions, check the issue first, if the problem persists, open a new issue.


NOTICE: Thanks to banjin-xjy, banjin and I have reonstructed this repo. The old repo was written based on Faster-RCNN, and remains tons of useless code and dependencies, make it hard to understand and maintain. Hence we reonstruct this repo. The old code is saved in branch master


roadmap

  • reonstruct the repo
  • cython nms and bbox utils
  • loss function as referred in paper
  • oriented text connector
  • BLSTM

setup

nms and bbox utils are written in cython, hence you have to build the library first.

cd utils/bbox
chmod +x make.sh
./make.sh

It will generate a nms.so and a bbox.so in current folder.


demo

  • follow setup to build the library
  • download the ckpt file from googl drive or baidu yun
  • put checkpoints_mlt/ in text-detection-ctpn/
  • put your images in data/demo, the results will be saved in data/res, and run demo in the root
python ./main/demo.py

training

prepare data

  • First, download the pre-trained model of VGG net and put it in data/vgg_16.ckpt. you can download it from tensorflow/models
  • Second, download the dataset we prepared from google drive or baidu yun. put the downloaded data in data/dataset/mlt, then start the training.
  • Also, you can prepare your own dataset according to the following steps.
  • Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root
python ./utils/prepare/split_label.py
  • it will generate the prepared data in data/dataset/
  • The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. A demo image of the prepared data is shown below.


train

Simplely run

python ./main/train.py
  • The model provided in checkpoints_mlt is trained on GTX1070 for 50k iters. It takes about 0.25s per iter. So it will takes about 3.5 hours to finished 50k iterations.

some results

NOTICE: all the photos used below are collected from the internet. If it affects you, please contact me to delete them.


oriented text connector

  • oriented text connector has been implemented, i's working, but still need futher improvement.
  • left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O

Comments
  • How to export model for Tensorflow Serving?

    How to export model for Tensorflow Serving?

    Follow some tutorials of exporting model for Tensorflow Serving, I've come up below trial:

    cfg_from_file('ctpn/text.yml')
    config = tf.ConfigProto(allow_soft_placement=True)
    with tf.Session(config=config) as sess:
            net = get_network("VGGnet_test")
    
            saver = tf.train.Saver()
            try:
                ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path)
                saver.restore(sess, ckpt.model_checkpoint_path)
            except:
                raise 'Missing pre-trained model: {}'.format(ckpt.model_checkpoint_path)
    
           # The main export trial is here
           #############################
            export_path = os.path.join(
                tf.compat.as_bytes('/tmp/ctpn'),
                tf.compat.as_bytes(str(1)))
            builder = tf.saved_model.builder.SavedModelBuilder(export_path)
    
            freezing_graph = sess.graph
            prediction_signature = tf.saved_model.signature_def_utils.predict_signature_def(
                inputs={'input': freezing_graph.get_tensor_by_name('Placeholder:0')},
                outputs={'output': freezing_graph.get_tensor_by_name('Placeholder_1:0')}
            )
    
            builder.add_meta_graph_and_variables(
                sess,
                [tf.saved_model.tag_constants.SERVING],
                signature_def_map={
                    tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature
                },
                clear_devices=True)
    
            builder.save()
            print('[INFO] Export SavedModel into {}'.format(export_path))
            #############################
    

    With output of freezing_graph is:

    Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
    Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
    Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32)
    Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
    

    After I got /tmp/ctpn/1 exported model, I try to load into Tensorflow Serving server:

    tensorflow_model_server --port=9000 --model_name=ctpn --model_base_path=/tmp/ctpn
    

    But it came up an error:

    Loading servable: {name: detector version: 1} failed: Not found: Op type not registered 'PyFunc' in binary running on [...]. Make sure the Op and Kernel are registered in the binary running in this process.
    

    So there are 2 questions:

    • Am I right about the inputs (Placeholder:0) and the outputs (Placeholder_1:0) of prediction_signature
    • Where do I miss PyFunc?
    opened by hiepph 20
  • exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!! do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000 iters,it can detect nothing?

    opened by cjt222 15
  • BiLSTM and Training Time

    BiLSTM and Training Time

    Thanks for sharing your implementation with us. I have implemented CTPN with Caffe which failed to converge when adding LSTM. First, I want to ask whether you have added the BiLSTM in your code or not. I am new to tensorflow. After looking at the code, I think you just implement the LSTM not the BiLSTM, is it right ? Second, I want to ask how long did you train your model? I have run the train script of your programs on a GPU device. It seems that it would take 5-6 days to finish the first 180000 iterations.

    Thanks very much.

    opened by TaoDream 14
  • I try to change code from python2 to python3

    I try to change code from python2 to python3

    I try to change code from python2 to python3,but when I finish all the mistake,and run the code,it caused error below,i do not know where it come from and how to carry out it,how can i do?Thank you so much

    2017-10-26 14:07:03.163176: W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/local/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "ctpn/demo.py", line 95, in _, _ = test_ctpn(sess, net, im) File "/root/chengjuntao/text-detection-ctpn/lib/fast_rcnn/test.py", line 171, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Caused by op 'rois/PyFunc', defined at: File "ctpn/demo.py", line 85, in net = get_network("VGGnet_test") File "/root/chengjuntao/text-detection-ctpn/lib/networks/factory.py", line 20, in get_network return VGGnet_test() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 14, in init self.setup() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 68, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 28, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 241, in proposal_layer [tf.float32,tf.float32]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    opened by cjt222 14
  • ModuleNotFoundError: No module named 'lib'

    ModuleNotFoundError: No module named 'lib'

    When I try to execute the demo.py im receiving this error

    File "demo.py", line 9, in from lib.networks.factory import get_network ModuleNotFoundError: No module named 'lib'

    Can someone please help me with this ?

    opened by SuryaprakashNSM 12
  • AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    File "/home/hjq/PycharmProjects/Tensorflow-OCR/text-detection-ctpn-master/ctpn/demo.py", line 103, in raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    opened by seawater668 11
  • After CTPN. What is your idea?

    After CTPN. What is your idea?

    Hello. eragonruan! I was very impressed with your code. One more time. Thank you so much.

    I have a problem. When the image passes through the CTPN, a green box is created. If I put this image in the OCR engine, how should I separate it (green box)? My idea is to use OpenCV. But the green box is on the text. so it hides the text. Can I draw a thin line to solve the problem?

    opened by rudebono 10
  • win10 on cpu encounter KeyError: b'TEST' error

    win10 on cpu encounter KeyError: b'TEST' error

    @eragonruan My enviroment is win10+cpu+tensorflow1.3. I have spend lot of time for this; I'm confused about this. Please help me on your spare time. Thanks!

    Here is the error: `2018-07-26 11:11:58.723047: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 11:11:58.728604: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32) Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32) Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32) Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32) Loading network VGGnet_test... Restoring from checkpoints/VGGnet_fast_rcnn_iter_50000.ckpt... done 2018-07-26 11:12:09.006369: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1306, in _run_fn status, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\contextlib.py", line 88, in exit next(self.gen) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "./ctpn/demo.py", line 97, in _, _ = test_ctpn(sess, net, im) File "D:\workspace\text-detection-ctpn-master\lib\fast_rcnn\test.py", line 51, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 895, in run run_metadata_ptr) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    Caused by op 'rois/PyFunc', defined at: File "./ctpn/demo.py", line 82, in net = get_network("VGGnet_test") File "D:\workspace\text-detection-ctpn-master\lib\networks\factory.py", line 8, in get_network return VGGnet_test() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 15, in init self.setup() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 56, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 21, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 215, in proposal_layer [tf.float32,tf.float32]) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\script_ops.py", line 203, in py_func input=inp, token=token, Tout=Tout, name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\gen_script_ops.py", line 36, in _py_func name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]`

    opened by Crocodiles 9
  •   UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    when i try to train my own datasets, i faced this core dump,

    Computing bounding-box regression targets... bbox target means: [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [ 0. 0. 0. 0.] bbox target stdevs: [[ 0.1 0.1 0.2 0.2] [ 0.1 0.1 0.2 0.2]] [ 0.1 0.1 0.2 0.2] Normalizing targets done Solving... /data/resys/var/python2.7.3/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Segmentation fault (core dumped)

    this fault caused by the function train_model() in lib/fast-rcnn/train.py , when it run train_op=opt.apply_gradients(list(zip(grads,tvars),global_step=global_step)) .

    have you ever faced this error?

    opened by louisly 8
  • WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    I cloned the project, downloaded the pretrained model and the training data, unzipped them then got a folder named TEXTVOC that contains other folders (Annotations, ImagesSets and JPEGImages). I placed TEXTVOC in the data folder and edited the paths in the split_label.py as following: path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages' gt_path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations'. But, when I run it python lib/prepare_training_data/split_label.py, I got this error:

    /home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages/img_1001.jpg
    Traceback (most recent call last):
      File "lib/prepare_training_data/split_label.py", line 34, in <module>
        with open(gt_file, 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations/gt_img_1001.txt'
    

    As it is shown, the gt_img_1001.txt is not missing, which means whether the gt_path is wrong whether those files do not exist. I looked into all folders and I didn't find any file starting with gt_* PS: I also run this command: ln -s TEXTVOC VOCdevkit2007, so I didn't miss anything that should be done.

    opened by maky-hnou 7
  • how to run the train scripts

    how to run the train scripts

    在split_label.py中有两个路径path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/image'和gt_path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/label',代码中会分别读取这个两个路径下的文件,请问....../label中放什么样的文件?谢谢

    opened by jibadallz 7
  • Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bumps tensorflow-gpu from 1.4.0 to 2.9.3.

    Release notes

    Sourced from tensorflow-gpu's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow-gpu's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.14.2 to 1.22.0

    Bump numpy from 1.14.2 to 1.22.0

    Bumps numpy from 1.14.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Model Quantization

    Model Quantization

    The pretrained model is favorable as it detects text with high accuracy. However, it takes long time to inference with CPU, this is out of expectation in production deployment. Is there a way to quantize the model?

    What I've tried is to convert the checkpoint model to saved_model format, then load from saved_model to perform quantization with TFLite converter, code snippet as followed:

    # Load checkpoint and convert to saved_model
    import tf
    trained_checkpoint_prefix = "checkpoints_mlt/ctpn_50000.ckpt"
    export_dir = "exported_model"
    
    graph = tf.Graph()
    with tf.compat.v1.Session(graph=graph) as sess:
        # Restore from checkpoint
        loader = tf.compat.v1.train.import_meta_graph(trained_checkpoint_prefix + ".meta")
        loader.restore(sess, trained_checkpoint_prefix)
    
    # Export checkpoint to SavedModel
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(sess,
                                         [tf.saved_model.TRAINING, tf.saved_model.SERVING],
                                         strip_default_attrs=True)
    builder.save()
    

    In result, I got a .pb file and and a variables folder with checkpoint and index files inside. Then errors popped out when I tried to perform quantization:

    converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    tflite_quant_model = converter.convert()
    

    This is the error:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-4-03205673177f> in <module>
         11 converter.inference_input_type = tf.int8  # or tf.uint8
         12 converter.inference_output_type = tf.int8  # or tf.uint8
    ---> 13 tflite_quant_model = converter.convert()
    
    ~/virtualenvironment/tf2/lib/python3.6/site-packages/tensorflow/lite/python/lite.py in convert(self)
        450     # TODO(b/130297984): Add support for converting multiple function.
        451     if len(self._funcs) != 1:
    --> 452       raise ValueError("This converter can only convert a single "
        453                        "ConcreteFunction. Converting multiple functions is "
        454                        "under development.")
    
    ValueError: This converter can only convert a single ConcreteFunction. Converting multiple functions is under development.
    

    Understand that this error was raised due to the multiple inputs input_image and input_im_info required by the model. Appreciate if anyone could help.

    opened by leeshien 0
  • Bump ipython from 5.1.0 to 7.16.3

    Bump ipython from 5.1.0 to 7.16.3

    Bumps ipython from 5.1.0 to 7.16.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(untagged-48d74c6337a71b6b5f87)
Owner
Shaohui Ruan
Interested in machine learning & computer vision
Shaohui Ruan
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream vid

Peace 10 Jun 30, 2021
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
list all open dataset about ocr.

ocr-open-dataset list all open dataset about ocr. printed dataset year Born-Digital Images (Web and Email) 2011-2015 COCO-Text 2017 Text Extraction fr

hongbomin 95 Nov 24, 2022
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
A program that takes in the hand gesture displayed by the user and translates ASL.

Interactive-ASL-Recognition Using the framework mediapipe made by google, OpenCV library and through self teaching, I was able to create a program tha

Riddhi Bajaj 3 Nov 22, 2021
Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

Hanane Kheirandish 2 May 31, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Version 2 is now available and under development in the master branch, read a story about v2: Why I refactor tesseract.js v2? Check the support/1.x br

Project Naptha 29.2k Jan 05, 2023
Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

PassportScanner Works with 2 and 3 line identity documents. What is this With PassportScanner you can use your camera to scan the MRZ code of a passpo

Edwin Vermeer 441 Dec 24, 2022
Document Image Dewarping

Document image dewarping using text-lines and line Segments Abstract Conventional text-line based document dewarping methods have problems when handli

Taeho Kil 268 Dec 23, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
PianoVisuals - Create background videos synced with piano music using opencv

Steps Record piano video Use Neural Network to do body segmentation (video matti

Solbiati Alessandro 4 Jan 24, 2022
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are comp

Pavankumar Khot 4 Mar 19, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022