A Python wrapper for the tesseract-ocr API

Overview

tesserocr

A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR).

TravisCI build status Latest version on PyPi

Supported python versions

tesserocr integrates directly with Tesseract's C++ API using Cython which allows for a simple Pythonic and easy-to-read source code. It enables real concurrent execution when used with Python's threading module by releasing the GIL while processing an image in tesseract.

tesserocr is designed to be Pillow-friendly but can also be used with image files instead.

Requirements

Requires libtesseract (>=3.04) and libleptonica (>=1.71).

On Debian/Ubuntu:

$ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config

You may need to manually compile tesseract for a more recent version. Note that you may need to update your LD_LIBRARY_PATH environment variable to point to the right library versions in case you have multiple tesseract/leptonica installations.

Cython (>=0.23) is required for building and optionally Pillow to support PIL.Image objects.

Installation

Linux and BSD/MacOS

$ pip install tesserocr

The setup script attempts to detect the include/library dirs (via pkg-config if available) but you can override them with your own parameters, e.g.:

$ CPPFLAGS=-I/usr/local/include pip install tesserocr

or

$ python setup.py build_ext -I/usr/local/include

Tested on Linux and BSD/MacOS

Windows

The proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.

The recommended method of installation is via Conda as described below.

Conda

You can use the conda-forge channel to install from Conda:

> conda install -c conda-forge tesserocr

pip

Download the wheel file corresponding to your Windows platform and Python installation from simonflueckiger/tesserocr-windows_build/releases and install them via:

> pip install <package_name>.whl

Usage

Initialize and re-use the tesseract API instance to score multiple images:

from tesserocr import PyTessBaseAPI

images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']

with PyTessBaseAPI() as api:
    for img in images:
        api.SetImageFile(img)
        print(api.GetUTF8Text())
        print(api.AllWordConfidences())
# api is automatically finalized when used in a with-statement (context manager).
# otherwise api.End() should be explicitly called when it's no longer needed.

PyTessBaseAPI exposes several tesseract API methods. Make sure you read their docstrings for more info.

Basic example using available helper functions:

import tesserocr
from PIL import Image

print(tesserocr.tesseract_version())  # print tesseract-ocr version
print(tesserocr.get_languages())  # prints tessdata path and list of available languages

image = Image.open('sample.jpg')
print(tesserocr.image_to_text(image))  # print ocr text from image
# or
print(tesserocr.file_to_text('sample.jpg'))

image_to_text and file_to_text can be used with threading to concurrently process multiple images which is highly efficient.

Advanced API Examples

GetComponentImages example:

from PIL import Image
from tesserocr import PyTessBaseAPI, RIL

image = Image.open('/usr/src/tesseract/testing/phototest.tif')
with PyTessBaseAPI() as api:
    api.SetImage(image)
    boxes = api.GetComponentImages(RIL.TEXTLINE, True)
    print('Found {} textline image components.'.format(len(boxes)))
    for i, (im, box, _, _) in enumerate(boxes):
        # im is a PIL image object
        # box is a dict with x, y, w and h keys
        api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
        ocrResult = api.GetUTF8Text()
        conf = api.MeanTextConf()
        print(u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
              "confidence: {1}, text: {2}".format(i, conf, ocrResult, **box))

Orientation and script detection (OSD):

from PIL import Image
from tesserocr import PyTessBaseAPI, PSM

with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
    image = Image.open("/usr/src/tesseract/testing/eurotext.tif")
    api.SetImage(image)
    api.Recognize()

    it = api.AnalyseLayout()
    orientation, direction, order, deskew_angle = it.Orientation()
    print("Orientation: {:d}".format(orientation))
    print("WritingDirection: {:d}".format(direction))
    print("TextlineOrder: {:d}".format(order))
    print("Deskew angle: {:.4f}".format(deskew_angle))

or more simply with OSD_ONLY page segmentation mode:

from tesserocr import PyTessBaseAPI, PSM

with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:
    api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")

    os = api.DetectOS()
    print("Orientation: {orientation}\nOrientation confidence: {oconfidence}\n"
          "Script: {script}\nScript confidence: {sconfidence}".format(**os))

more human-readable info with tesseract 4+ (demonstrates LSTM engine usage):

from tesserocr import PyTessBaseAPI, PSM, OEM

with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:
    api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")

    os = api.DetectOrientationScript()
    print("Orientation: {orient_deg}\nOrientation confidence: {orient_conf}\n"
          "Script: {script_name}\nScript confidence: {script_conf}".format(**os))

Iterator over the classifier choices for a single symbol:

from __future__ import print_function

from tesserocr import PyTessBaseAPI, RIL, iterate_level

with PyTessBaseAPI() as api:
    api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')
    api.SetVariable("save_blob_choices", "T")
    api.SetRectangle(37, 228, 548, 31)
    api.Recognize()

    ri = api.GetIterator()
    level = RIL.SYMBOL
    for r in iterate_level(ri, level):
        symbol = r.GetUTF8Text(level)  # r == ri
        conf = r.Confidence(level)
        if symbol:
            print(u'symbol {}, conf: {}'.format(symbol, conf), end='')
        indent = False
        ci = r.GetChoiceIterator()
        for c in ci:
            if indent:
                print('\t\t ', end='')
            print('\t- ', end='')
            choice = c.GetUTF8Text()  # c == ci
            print(u'{} conf: {}'.format(choice, c.Confidence()))
            indent = True
        print('---------------------------------------------')
Comments
  • !strcmp(locale,

    !strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209

    import tesserocr from PIL import Image

    image = Image.open('image.png') print(tesserocr.image_to_text(image))

    Mac 10.14.1 tesserocr 2.3.1
    Python 3.6.7

    opened by charhuang 20
  • Add recipe for conda-forge channel

    Add recipe for conda-forge channel

    Hello @sirfz!

    I wonder if there are any plans on creating a recipe for tesserocr on the community-based conda-forge channel for anaconda (https://conda-forge.org). With the tesseract library being available for Linux, OS X and Windows (see https://anaconda.org/conda-forge/tesseract), this would provide the perfect platform to distibute this fantastic package.

    Are there any interests in this? I already made a conda recipe a few months ago that integrates well with the tesseract version 4.0.0 from conda-forge (see https://anaconda.org/chilipp/tesserocr, at that time tesseract was only available for linux and osx) and I would be willing to setup the basis for a tesserocr-feedstock on conda-forge (of course with adding you as maintainer). Once this initial setup is sorted out, the maintenance of the feedstock on conda-forge should be pretty straight-forward. What do you think?

    opened by Chilipp 16
  • crash on rotated image

    crash on rotated image

    [email protected] [~]⚡ convert eurotext.tif -rotate 3 +repage eurotext_ang.tif
    [email protected] [~]⚡ tesseract eurotext_ang.tif - -psm 0 
    Orientation: 0
    Orientation in degrees: 0
    Orientation confidence: 20.66
    Script: 1
    Script confidence: 39.58
    
    image = Image.open('eurotext_ang.tif')
    
    with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
        api.SetImage(image)
        api.Recognize()
        it = api.AnalyseLayout()
        it.Orientation()
    

    output

    AttributeError: 'NoneType' object has no attribute 'Orientation'
    
    opened by reubano 13
  • Build on EC2 fails

    Build on EC2 fails

    Trying to build a deployment package for Tesser with tesserocr for AWS Lambda on an EC2 instance. Works until I pip install tesserocr - fails to find libraries and tesseract.pc.

    Probably the issue you highlight in the read.me but not my core competence I am afraid.

    Brief excerpt of trace:

    Collecting tesserocr Using cached tesserocr-2.2.2.tar.gz pkg-config failed to find tesseract/lept libraries: Package tesseract was not found in the pkg-config search path. Perhaps you should add the directory containing `tesseract.pc' to the PKG_CONFIG_PATH environment variable No package 'tesseract' found Supporting tesseract v4.00.00 Building with configs: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 262144}} Compiling tesserocr.pyx because it changed. [1/1] Cythonizing tesserocr.pyx

    opened by Chanonry 12
  • tesserocr fails to detect Tesseract 4

    tesserocr fails to detect Tesseract 4

    I compiled Tesseract 4 from source and tried it from the command line (it works). Executable is in /usr/local/bin, library is in /usr/local/lib.

    export LD_LIBRARY_PATH=/usr/local/lib
    

    tesserocr will still load Tesseract 3 which shouldn't be on my system anymore according to apt-get.

    opened by Belval 12
  • AttributeError when calling SetImage() (python 3)

    AttributeError when calling SetImage() (python 3)

    In Python 3.5.2 (in an ipython console) I've copied the file eurotext.tif from this repository to my working directory. I get an error trying to work with that image:

    In [50]: from tesserocr import PyTessBaseAPI
    
    In [51]: with PyTessBaseAPI as api:
        ...:     api.SetImageFile('eurotext.tif')
        ...:
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-51-252118aec8ba> in <module>()
    ----> 1 with PyTessBaseAPI as api:
          2     api.SetImageFile('eurotext.tif')
          3
    
    AttributeError: __exit__
    
    

    Also trying to use it directly:

    In [52]: tesseract = PyTessBaseAPI()
    
    In [53]: tesseract.SetImage('eurotext.tif')
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-53-661b17ef8a1f> in <module>()
    ----> 1 tesseract.SetImage('eurotext.tif')
    
    tesserocr.pyx in tesserocr.PyTessBaseAPI.SetImage (tesserocr.cpp:13256)()
    
    tesserocr.pyx in tesserocr._image_buffer (tesserocr.cpp:2916)()
    
    tesserocr.pyx in tesserocr._image_buffer (tesserocr.cpp:2780)()
    
    AttributeError: 'str' object has no attribute 'save'
    

    I'm able to open the image in PIL so it's a valid image:

    In [54]: from PIL import Image
    
    In [55]: im = Image.open('eurotext.tif')
    
    In [56]: im
    Out[56]: <PIL.TiffImagePlugin.TiffImageFile image mode=1 size=1024x800 at 0x7F7E100BF5F8>
    
    

    What's going on here? Thanks in advance.

    Here's what I have installed if that's helpful.

    Cython==0.25.1
    dask==0.12.0
    decorator==4.0.10
    ipython==5.1.0
    ipython-genutils==0.1.0
    networkx==1.11
    numpy==1.11.2
    pexpect==4.2.1
    pickleshare==0.7.4
    Pillow==3.4.2
    prompt-toolkit==1.0.9
    ptyprocess==0.5.1
    Pygments==2.1.3
    scikit-image==0.12.3
    scipy==0.18.1
    simplegeneric==0.8.1
    six==1.10.0
    tesserocr==2.1.3
    toolz==0.8.1
    traitlets==4.3.1
    wcwidth==0.1.7
    

    Also I am able to run tesserocr's tests (python3 setup.py test) without any errors so I think tesserocr is installed ok.

    opened by dtenenba 12
  • incorrectly detects orientation

    incorrectly detects orientation

    I've noticed the orientation example doesn't distinguish between upside down/rightside up and clockwise/counter clockwise orientations.

    [email protected] [~]⚡ tesseract -psm 0 up.jpg - 
    Orientation: 0
    Orientation in degrees: 0
    Orientation confidence: 0.23
    Script: 1
    Script confidence: 0.98
    
    [email protected] [~]⚡ tesseract -psm 0 down.jpg - 
    Orientation: 2
    Orientation in degrees: 180
    Orientation confidence: 0.21
    Script: 1
    Script confidence: 0.61
    
    with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
        for path in ['up.jpg', 'down.jpg']:
            image = Image.open(path)
            api.SetImage(image)
            api.Recognize()
            it = api.AnalyseLayout()    
            print it.Orientation()
    
    (0, 0, 2, 0.0)
    (0, 0, 2, 0.0)
    
    opened by reubano 12
  • error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    I got tesseract installed correctly.

    But when I try to compiled the project with python setup.py build_ext -I/usr/local/include I got the following messages:

    Supporting tesseract v4.0.0-beta.1
    Configs from pkg-config: {'libraries': ['lept', 'tesseract'], 'cython_compile_time_env': {'TESSERACT_VERSION': 1024}, 'include_dirs': ['/usr/include']}
    running build_ext
    Compiling tesserocr.pyx because it changed.
    [1/1] Cythonizing tesserocr.pyx
    building 'tesserocr' extension
    creating build/temp.linux-x86_64-2.7
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include -I/usr/local/include -I/usr/include/python2.7 -c tesserocr.cpp -o build/temp.linux-x86_64-2.7/tesserocr.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    In file included from /usr/include/c++/5/cinttypes:35:0,
                     from /usr/include/tesseract/host.h:30,
                     from /usr/include/tesseract/tesscallback.h:22,
                     from /usr/include/tesseract/genericvector.h:27,
                     from tesserocr.cpp:282:
    /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
     #error This file requires compiler and library support \
      ^
    In file included from tesserocr.cpp:280:0:
    /usr/include/tesseract/publictypes.h:33:1: error: ‘constexpr’ does not name a type
     constexpr int kPointsPerInch = 72;
     ^
    /usr/include/tesseract/publictypes.h:33:1: note: C++11 ‘constexpr’ only available with -std=c++11 or -std=gnu++11
    /usr/include/tesseract/publictypes.h:38:1: error: ‘constexpr’ does not name a type
     constexpr int kMinCredibleResolution = 70;
     ^
    /usr/include/tesseract/publictypes.h:38:1: note: C++11 ‘constexpr’ only available with -std=c++11 or -std=gnu++11
    /usr/include/tesseract/publictypes.h:40:1: error: ‘constexpr’ does not name a type
     constexpr int kMaxCredibleResolution = 2400;
     ^
    /usr/include/tesseract/publictypes.h:40:1: note: C++11 ‘constexpr’ only available with -std=c++11 or -std=gnu++11
    /usr/include/tesseract/publictypes.h:45:1: error: ‘constexpr’ does not name a type
     constexpr int kResolutionEstimationFactor = 10;
     ^
    /usr/include/tesseract/publictypes.h:45:1: note: C++11 ‘constexpr’ only available with -std=c++11 or -std=gnu++11
    In file included from /usr/include/tesseract/genericvector.h:29:0,
                     from tesserocr.cpp:282:
    /usr/include/tesseract/helpers.h: In member function ‘void tesseract::TRand::set_seed(const string&)’:
    /usr/include/tesseract/helpers.h:50:5: error: ‘hash’ is not a member of ‘std’
         std::hash<std::string> hasher;
         ^
    /usr/include/tesseract/helpers.h:50:26: error: expected primary-expression before ‘>’ token
         std::hash<std::string> hasher;
                              ^
    /usr/include/tesseract/helpers.h:50:28: error: ‘hasher’ was not declared in this scope
         std::hash<std::string> hasher;
                                ^
    In file included from tesserocr.cpp:282:0:
    /usr/include/tesseract/genericvector.h: In member function ‘size_t GenericVector<T>::unsigned_size() const’:
    /usr/include/tesseract/genericvector.h:78:60: error: there are no arguments to ‘static_assert’ that depend on a template parameter, so a declaration of ‘static_assert’ must be available [-fpermissive]
                       "Wow! sizeof(size_t) < sizeof(int32_t)!!");
                                                                ^
    /usr/include/tesseract/genericvector.h:78:60: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
    In file included from /usr/include/tesseract/osdetect.h:24:0,
                     from tesserocr.cpp:293:
    /usr/include/tesseract/unicharset.h: In member function ‘void UNICHARSET::unichar_insert(const char*)’:
    /usr/include/tesseract/unicharset.h:260:34: error: ‘OldUncleanUnichars’ is not a class or namespace
         unichar_insert(unichar_repr, OldUncleanUnichars::kFalse);
                                      ^
    /usr/include/tesseract/unicharset.h: In member function ‘void UNICHARSET::unichar_insert_backwards_compatible(const char*)’:
    /usr/include/tesseract/unicharset.h:267:36: error: ‘OldUncleanUnichars’ is not a class or namespace
           unichar_insert(unichar_repr, OldUncleanUnichars::kTrue);
                                        ^
    /usr/include/tesseract/unicharset.h:270:36: error: ‘OldUncleanUnichars’ is not a class or namespace
           unichar_insert(unichar_repr, OldUncleanUnichars::kFalse);
                                        ^
    /usr/include/tesseract/unicharset.h:272:38: error: ‘OldUncleanUnichars’ is not a class or namespace
             unichar_insert(unichar_repr, OldUncleanUnichars::kTrue);
                                          ^
    tesserocr.cpp: In function ‘tesseract::TessResultRenderer* __pyx_f_9tesserocr_13PyTessBaseAPI__get_renderer(__pyx_obj_9tesserocr_PyTessBaseAPI*, __pyx_t_9tesseract_cchar_t*)’:
    tesserocr.cpp:17157:106: error: no matching function for call to ‘tesseract::TessPDFRenderer::TessPDFRenderer(__pyx_t_9tesseract_cchar_t*&, const char*)’
           __pyx_t_3 = new tesseract::TessPDFRenderer(__pyx_v_outputbase, __pyx_v_self->_baseapi.GetDatapath());
                                                                                                              ^
    In file included from tesserocr.cpp:292:0:
    /usr/include/tesseract/renderer.h:190:3: note: candidate: tesseract::TessPDFRenderer::TessPDFRenderer(const char*, const char*, bool)
       TessPDFRenderer(const char* outputbase, const char* datadir, bool textonly);
       ^
    /usr/include/tesseract/renderer.h:190:3: note:   candidate expects 3 arguments, 2 provided
    /usr/include/tesseract/renderer.h:186:16: note: candidate: tesseract::TessPDFRenderer::TessPDFRenderer(const tesseract::TessPDFRenderer&)
     class TESS_API TessPDFRenderer : public TessResultRenderer {
                    ^
    /usr/include/tesseract/renderer.h:186:16: note:   candidate expects 1 argument, 2 provided
    tesserocr.cpp: In function ‘void inittesserocr()’:
    tesserocr.cpp:25339:69: error: ‘OEM_CUBE_ONLY’ is not a member of ‘tesseract’
       __pyx_t_1 = __Pyx_PyInt_From_enum__tesseract_3a__3a_OcrEngineMode(tesseract::OEM_CUBE_ONLY); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 85; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
                                                                         ^
    tesserocr.cpp:25352:69: error: ‘OEM_TESSERACT_CUBE_COMBINED’ is not a member of ‘tesseract’
       __pyx_t_1 = __Pyx_PyInt_From_enum__tesseract_3a__3a_OcrEngineMode(tesseract::OEM_TESSERACT_CUBE_COMBINED); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 86; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
                                                                         ^
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    

    I also tried to install with : pip install ., but when I import the project, I got ImportError: libtesseract.so.3: cannot open shared object file: No such file or directory

    opened by eugene123tw 11
  • Ubuntu 14.04 docker image error: Failed to init API, possible an invalid tessdata path: /usr/local/share/

    Ubuntu 14.04 docker image error: Failed to init API, possible an invalid tessdata path: /usr/local/share/

    I'm using tesseract-ocr (version 4.00 beta) with tesserocr (2.2.2) and the location of tessdata folder is: /usr/local/share/

    But I'm still getting the invalid tessdata path error. I've tried the following to fix it:

    1. Assigned the environment variable as TESSDATA_PREFIX ='/usr/local/share/'
    2. Added path='/usr/local/share/' in PyTessBaseAPI()

    The location of the tessdata folder is correct but I'm still not able to use this. Note: I'm using docker with ubuntu 14.04 image and tesserocr version is 2.2.2 and tesseract version is 4.00 beta

    How do I resolve this issue?

    opened by vatsal28 10
  • No package 'tesseract' found

    No package 'tesseract' found

    I tried to install tesserocr in Ubuntu. I got following error. I have installed tesseract already. I donot know why it can not find.

    Can someone help me out ?

    $tesseract -v
    tesseract 3.03
     leptonica-1.70
      libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0
    
    $ CPPFLAGS=-I/usr/lib pip install tesserocr
      Using cached tesserocr-2.1.2.tar.gz
        Complete output from command python setup.py egg_info:
        running egg_info
        creating pip-egg-info/tesserocr.egg-info
        writing pip-egg-info/tesserocr.egg-info/PKG-INFO
        writing top-level names to pip-egg-info/tesserocr.egg-info/top_level.txt
        writing dependency_links to pip-egg-info/tesserocr.egg-info/dependency_links.txt
        writing manifest file 'pip-egg-info/tesserocr.egg-info/SOURCES.txt'
        warning: manifest_maker: standard file '-c' not found
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 166, in <module>
            test_suite='tests'
          File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
            dist.run_commands()
          File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
            self.run_command(cmd)
          File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
            cmd_obj.run()
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 195, in run
            self.find_sources()
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 222, in find_sources
            mm.run()
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 306, in run
            self.add_defaults()
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 335, in add_defaults
            sdist.add_defaults(self)
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line 160, in add_defaults
            build_ext = self.get_finalized_command('build_ext')
          File "/usr/lib/python2.7/distutils/cmd.py", line 311, in get_finalized_command
            cmd_obj = self.distribution.get_command_obj(command, create)
          File "/usr/lib/python2.7/distutils/dist.py", line 846, in get_command_obj
            cmd_obj = self.command_obj[command] = klass(self)
          File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in __init__
            _Command.__init__(self, dist)
          File "/usr/lib/python2.7/distutils/cmd.py", line 64, in __init__
            self.initialize_options()
          File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 120, in initialize_options
            build_args = package_config()
          File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 59, in package_config
            raise Exception(error)
        Exception: Package tesseract was not found in the pkg-config search path.
        Perhaps you should add the directory containing `tesseract.pc'
        to the PKG_CONFIG_PATH environment variable
        No package 'tesseract' found
    
    opened by farscape2012 10
  • WordFontAttributes does not work

    WordFontAttributes does not work

    As the last two comments in https://github.com/sirfz/tesserocr/issues/68 suggest, tesseract should be providing WordFontAttributes data, but tesserocr seems to not expose it.

    Is this a problem with tesserocr?

    The code I'm trying to use:

    import tesserocr
    from PIL import Image
    
    with tesserocr.PyTessBaseAPI() as api:
        image = Image.open("image.png")
        api.SetImage(image)
        api.Recognize()
        iterator = api.GetIterator()
        print(iterator.WordFontAttributes())
    
    
    opened by MinmoTech 9
  • No docstrings or autocomplete in vscode, windows install

    No docstrings or autocomplete in vscode, windows install

    Sorry if this is already expected behavior but when installing via conda environment conda install -c conda-forge tesserocr I am able to execute successfully but don't see intellisense (Pylance) docs or type information? image

    help wanted 
    opened by Jugbot 3
  • ImportError symbol not found in flat namespace '__ZN9tesseract11TessBaseAPID1Ev'

    ImportError symbol not found in flat namespace '__ZN9tesseract11TessBaseAPID1Ev'

    Hi guys! I have an error in using tesseract, I tried to install it on mac with m1, and the installation is complete successful (screen 1), after that, I built my project which uses tesserorc version 2.5.2, and, after I start the project I have that error, maybe someone solves it — or maybe met it? Thank you in advance.

    Tesseract version:

    1

    Error:

    2

    Tesserocr version poetry:

    3

    opened by lokkasl 3
  • Python3.11 support

    Python3.11 support

    Defaulting to user installation because normal site-packages is not writeable
    Collecting tesserocr
      Using cached tesserocr-2.5.2.tar.gz (57 kB)
      Preparing metadata (setup.py): started
      Preparing metadata (setup.py): finished with status 'done'
    Building wheels for collected packages: tesserocr
      Building wheel for tesserocr (setup.py): started
      Building wheel for tesserocr (setup.py): finished with status 'error'
      error: subprocess-exited-with-error
      
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [126 lines of output]
          Supporting tesseract v5.1.0-17-g6814
          Tesseract major version 5
          Configs from pkg-config: {'library_dirs': [], 'include_dirs': ['/usr/include', '/usr/include'], 'libraries': ['tesseract', 'archive', 'curl', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 5, 'TESSERACT_VERSION': 83951616}}
          running bdist_wheel
          running build
          running build_ext
          Detected compiler: unix
          building 'tesserocr' extension
          creating build
          creating build/temp.linux-x86_64-3.11
          x86_64-linux-gnu-gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include -I/usr/include/python3.11 -I/usr/local/include/python3.11 -c tesserocr.cpp -o build/temp.linux-x86_64-3.11/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
          tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_34GetAvailableLanguages(__pyx_obj_9tesserocr_PyTessBaseAPI*)’:
          tesserocr.cpp:17031:35: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<std::__cxx11::basic_string<char> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
          17031 |     for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
                |                         ~~~~~~~~~~^~~~~~~~~~~
          tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_12get_languages(PyObject*, PyObject*)’:
          tesserocr.cpp:28146:35: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<std::__cxx11::basic_string<char> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
          28146 |     for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_4; __pyx_t_5+=1) {
                |                         ~~~~~~~~~~^~~~~~~~~~~
          tesserocr.cpp: In function ‘int __Pyx_PyBytes_Equals(PyObject*, PyObject*, int)’:
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp: In function ‘void __Pyx_AddTraceback(const char*, int, int, const char*)’:
          tesserocr.cpp:487:62: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
            487 |   #define __Pyx_PyFrame_SetLineNumber(frame, lineno)  (frame)->f_lineno = (lineno)
                |                                                              ^~
          tesserocr.cpp:39234:5: note: in expansion of macro ‘__Pyx_PyFrame_SetLineNumber’
          39234 |     __Pyx_PyFrame_SetLineNumber(py_frame, py_line);
                |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          tesserocr.cpp: In function ‘PyObject* __Pyx_Coroutine_SendEx(__pyx_CoroutineObject*, PyObject*, int)’:
          tesserocr.cpp:41212:14: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41212 |             f->f_back = PyThreadState_GetFrame(tstate);
                |              ^~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          In file included from /usr/include/python3.11/Python.h:44,
                           from tesserocr.cpp:41:
          tesserocr.cpp: In function ‘void __Pyx_Coroutine_ResetFrameBackpointer(__Pyx_ExcInfoStruct*)’:
          tesserocr.cpp:41249:19: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41249 |         Py_CLEAR(f->f_back);
                |                   ^~
          /usr/include/python3.11/object.h:107:41: note: in definition of macro ‘_PyObject_CAST’
            107 | #define _PyObject_CAST(op) ((PyObject*)(op))
                |                                         ^~
          tesserocr.cpp:41249:9: note: in expansion of macro ‘Py_CLEAR’
          41249 |         Py_CLEAR(f->f_back);
                |         ^~~~~~~~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          In file included from /usr/include/python3.11/Python.h:44,
                           from tesserocr.cpp:41:
          tesserocr.cpp:41249:19: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41249 |         Py_CLEAR(f->f_back);
                |                   ^~
          /usr/include/python3.11/object.h:566:14: note: in definition of macro ‘Py_CLEAR’
            566 |             (op) = NULL;                        \
                |              ^~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for tesserocr
      Running setup.py clean for tesserocr
    Failed to build tesserocr
    Installing collected packages: tesserocr
      Running setup.py install for tesserocr: started
      Running setup.py install for tesserocr: finished with status 'error'
      error: subprocess-exited-with-error
      
      × Running setup.py install for tesserocr did not run successfully.
      │ exit code: 1
      ╰─> [126 lines of output]
          Supporting tesseract v5.1.0-17-g6814
          Tesseract major version 5
          Configs from pkg-config: {'library_dirs': [], 'include_dirs': ['/usr/include', '/usr/include'], 'libraries': ['tesseract', 'archive', 'curl', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 5, 'TESSERACT_VERSION': 83951616}}
          running install
          running build
          running build_ext
          Detected compiler: unix
          building 'tesserocr' extension
          creating build
          creating build/temp.linux-x86_64-3.11
          x86_64-linux-gnu-gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include -I/usr/include/python3.11 -I/usr/local/include/python3.11 -c tesserocr.cpp -o build/temp.linux-x86_64-3.11/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
          tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_34GetAvailableLanguages(__pyx_obj_9tesserocr_PyTessBaseAPI*)’:
          tesserocr.cpp:17031:35: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<std::__cxx11::basic_string<char> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
          17031 |     for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
                |                         ~~~~~~~~~~^~~~~~~~~~~
          tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_12get_languages(PyObject*, PyObject*)’:
          tesserocr.cpp:28146:35: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<std::__cxx11::basic_string<char> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
          28146 |     for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_4; __pyx_t_5+=1) {
                |                         ~~~~~~~~~~^~~~~~~~~~~
          tesserocr.cpp: In function ‘int __Pyx_PyBytes_Equals(PyObject*, PyObject*, int)’:
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38610:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38610 |             hash1 = ((PyBytesObject*)s1)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp:38611:43: warning: ‘PyBytesObject::ob_shash’ is deprecated [-Wdeprecated-declarations]
          38611 |             hash2 = ((PyBytesObject*)s2)->ob_shash;
                |                                           ^~~~~~~~
          In file included from /usr/include/python3.11/bytesobject.h:62,
                           from /usr/include/python3.11/Python.h:50,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/cpython/bytesobject.h:7:35: note: declared here
              7 |     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
                |                                   ^~~~~~~~
          tesserocr.cpp: In function ‘void __Pyx_AddTraceback(const char*, int, int, const char*)’:
          tesserocr.cpp:487:62: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
            487 |   #define __Pyx_PyFrame_SetLineNumber(frame, lineno)  (frame)->f_lineno = (lineno)
                |                                                              ^~
          tesserocr.cpp:39234:5: note: in expansion of macro ‘__Pyx_PyFrame_SetLineNumber’
          39234 |     __Pyx_PyFrame_SetLineNumber(py_frame, py_line);
                |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          tesserocr.cpp: In function ‘PyObject* __Pyx_Coroutine_SendEx(__pyx_CoroutineObject*, PyObject*, int)’:
          tesserocr.cpp:41212:14: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41212 |             f->f_back = PyThreadState_GetFrame(tstate);
                |              ^~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          In file included from /usr/include/python3.11/Python.h:44,
                           from tesserocr.cpp:41:
          tesserocr.cpp: In function ‘void __Pyx_Coroutine_ResetFrameBackpointer(__Pyx_ExcInfoStruct*)’:
          tesserocr.cpp:41249:19: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41249 |         Py_CLEAR(f->f_back);
                |                   ^~
          /usr/include/python3.11/object.h:107:41: note: in definition of macro ‘_PyObject_CAST’
            107 | #define _PyObject_CAST(op) ((PyObject*)(op))
                |                                         ^~
          tesserocr.cpp:41249:9: note: in expansion of macro ‘Py_CLEAR’
          41249 |         Py_CLEAR(f->f_back);
                |         ^~~~~~~~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          In file included from /usr/include/python3.11/Python.h:44,
                           from tesserocr.cpp:41:
          tesserocr.cpp:41249:19: error: invalid use of incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
          41249 |         Py_CLEAR(f->f_back);
                |                   ^~
          /usr/include/python3.11/object.h:566:14: note: in definition of macro ‘Py_CLEAR’
            566 |             (op) = NULL;                        \
                |              ^~
          In file included from /usr/include/python3.11/Python.h:42,
                           from tesserocr.cpp:41:
          /usr/include/python3.11/pytypedefs.h:22:16: note: forward declaration of ‘PyFrameObject’ {aka ‘struct _frame’}
             22 | typedef struct _frame PyFrameObject;
                |                ^~~~~~
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> tesserocr
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    
    
    opened by SmartManoj 2
  • Installing in a virtual environment

    Installing in a virtual environment

    I'm using Windows 10 WSL2 (Ubuntu 20.04). I've already successfully installed the package in my default environment but when I try to run pip install tesserocr on a virtual environment I get the following error:

    Building wheels for collected packages: tesserocr
      Building wheel for tesserocr (setup.py) ... error
      error: subprocess-exited-with-error
      
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [16 lines of output]
          Supporting tesseract v4.1.1
          Tesseract major version 4
          Configs from pkg-config: {'library_dirs': [], 'include_dirs': ['/usr/include', '/usr/include'], 'libraries': ['tesseract', 'archive', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 4, 'TESSERACT_VERSION': 67174656}}
          running bdist_wheel
          running build
          running build_ext
          Detected compiler: unix
          building 'tesserocr' extension
          creating build
          creating build/temp.linux-x86_64-3.9
          x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include -I/<PATH>/env/include -I/usr/include/python3.9 -c tesserocr.cpp -o build/temp.linux-x86_64-3.9/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
          tesserocr.cpp:42:10: fatal error: Python.h: No such file or directory
             42 | #include "Python.h"
                |          ^~~~~~~~~~
          compilation terminated.
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for tesserocr
      Running setup.py clean for tesserocr
    Failed to build tesserocr
    Installing collected packages: tesserocr
      Running setup.py install for tesserocr ... error
      error: subprocess-exited-with-error
      
      × Running setup.py install for tesserocr did not run successfully.
      │ exit code: 1
      ╰─> [16 lines of output]
          Supporting tesseract v4.1.1
          Tesseract major version 4
          Configs from pkg-config: {'library_dirs': [], 'include_dirs': ['/usr/include', '/usr/include'], 'libraries': ['tesseract', 'archive', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 4, 'TESSERACT_VERSION': 67174656}}
          running install
          running build
          running build_ext
          Detected compiler: unix
          building 'tesserocr' extension
          creating build
          creating build/temp.linux-x86_64-3.9
          x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include -I/<PATH>/include -I/usr/include/python3.9 -c tesserocr.cpp -o build/temp.linux-x86_64-3.9/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
          tesserocr.cpp:42:10: fatal error: Python.h: No such file or directory
             42 | #include "Python.h"
                |          ^~~~~~~~~~
          compilation terminated.
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> tesserocr
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    

    My default python is 3.8 while the environment is 3.9

    opened by bd-charu 1
  • Tesseract 5.0.1 test_LSTM_choices(...) fails

    Tesseract 5.0.1 test_LSTM_choices(...) fails

    I compiled tesserocr 2.5.2 with Tesseract 5.0.1 on Windows. When executing tesserocr\tests\test_api.py I get the following exception for test_LSTM_choices(...):

    FAIL: test_LSTM_choices (tests.test_api.TestTessBaseApi)
    Test GetBestLSTMSymbolChoices.
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "tesserocr\tests\test_api.py", line 201, in test_LSTM_choices
        self.assertLessEqual(alternative[1], 2.0)
    AssertionError: 3.621181011199951 not less than or equal to 2.0
    

    Very similar to this https://github.com/sirfz/tesserocr/pull/147#discussion_r342202823. It passes when built with Tesseract 4.1.3. Does this also pass on Travis for Tesseract 5.x? I get a 404 when trying to access the build pipeline.

    opened by simonflueckiger 11
Releases(v2.5.2)
  • v2.5.2(Jun 19, 2021)

    • Support new Tesseract 5 API (#242)
    • Support Windows build (#250)
    • GetBestLSTMSymbolChoices crash fix (#241)
    • Fallback to BMP instead of PNG
    • Create pix from a BMP image bytes (#156)
    Source code(tar.gz)
    Source code(zip)
  • v2.5.1(Mar 17, 2020)

  • v2.5.0(Nov 8, 2019)

    New features and enhancements:

    • Support for RowAttributes method in LTRResultIterator (#192)
    • SetImage: use PNG instead of JPEG fallback (#194)
    • Replace STRING::string() by c_str() (#197)
    • Don't use assignment operator for TessBaseAPI (#200)
    Source code(tar.gz)
    Source code(zip)
  • v2.4.1(Aug 23, 2019)

  • v2.4.0(Dec 5, 2018)

  • v2.3.1(Aug 13, 2018)

  • v2.3.0(Jun 26, 2018)

    • Support for Tesseract 4
      • New OCR engines LSTM_ONLY and TESSERACT_LSTM_COMBINED
      • New default tessdata path handling (#104)
    • Fixed compilation against Tesseract v3.05.02 which required c++11 (#120)
    • Fallback to 'eng' as default language when default language returned by the API is empty (#103)
    • Added notes about Windows installation in the README (#97)
    Source code(tar.gz)
    Source code(zip)
  • v2.2.2(Jul 26, 2017)

    • Support timeout in Recognize API methods (#55)
      • You can now pass a timeout parameter (milliseconds) to the Recognize and RecognizeForChopTest API methods.
    • Fixed typo in _Enum initialization error message formatting (#56)
    • Display tessdata path in init exception message (#60)
    • Fixed version check in Python 3 when reading the version number from the tesseract executable (#60)
    Source code(tar.gz)
    Source code(zip)
  • v2.2.1(May 31, 2017)

    Fixed setup bug that affects gcc versions with no -std=c++11 option support (which should be required by tesseract 4.0+ and not older versions). #53

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(May 28, 2017)

    • Improved setup script
    • Tesseract 4.0 support:
      • Two new OEM enums: OEM.LSTM_ONLY and OEM.TESSERACT_LSTM_COMBINED (tesseract 4.0+)
      • Two new API methods: GetTSVText and DetectOrientationScript (tesseract 4.0+)
      • PyTessBaseApi.__init__ now accepts a new attribute oem (OCR engine mode: OEM.DEFAULT by default).
      • file_to_text and image_to_text functions now also accept the oem attribute as above.
    • Fixed segfault on API Init* failure
    • Fixed segfault when pixa_to_list returns NULL
    • Documentation fixes and other minor improvments
    Source code(tar.gz)
    Source code(zip)
  • v2.1.3(Nov 12, 2016)

    Bug fix release:

    • Improved setup: attempt compile with default environment variables even if pkg-config fails
    • WordFontAttributes now returns None instead of segfaulting when NULL pointer is returned by API
    Source code(tar.gz)
    Source code(zip)
  • v2.1.2(Jun 8, 2016)

  • v2.1.1(Jun 3, 2016)

    • Improved PIL image conversion to Pix: preserve original image format (#5)
    • Added DetectOS api method (#6)
    • Support TessOsdRenderer introduced in tesseract v3.04.01
    • Improved setup environment detection
    • Python 3 support
    Source code(tar.gz)
    Source code(zip)
Owner
Fayez
Fayez
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 03, 2023
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 07, 2023
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
Connect Aseprite to Blender for painting pixelart textures in real time

Pribambase Pribambase is a small tool that connects Aseprite and Blender, to allow painting with instant viewport feedback and all functionality of ex

117 Jan 03, 2023
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

GeorgeJoe 171 Aug 04, 2022
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 04, 2023
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.

ID Verification by LibraX.ai This is the first free Identity verification in the market. LibraX.ai is an identity verification platform for developers

LibraX.ai 46 Dec 06, 2022
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 - 第三名解决方案

天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 比赛链接 个人博客记录 目录结构 ├── final------------------------------------决赛方案PPT ├── preliminary_contest--------------------

19 Aug 17, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

3 Sep 17, 2022