🌑

Stephen's Blog

Usage of PyTessBaseAPI in Tesserocr

 

Stephen Cheng

Intro

Tesseroct is a simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR), it integrates directly with Tesseract’s C++ API using Cython which allows for a simple Pythonic and easy-to-read source code. It enables real concurrent execution when used with Python’s threading module by releasing the GIL while processing an image in tesseract.

Installation

  • Linux and BSD/MacOS
1
$ pip install tesserocr
  • Windows

The proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. The recommended method of installation is via Conda as described below.

1) Conda

1
> conda install -c conda-forge tesserocr

2) pip
Download the wheel file corresponding to your Windows platform and Python installation from tesserocr-windows_build and install them via:

1
> pip install <package_name>.whl

Usage

  • Initialize and re-use the tesseract API instance to score multiple images:
1
2
3
4
5
6
7
8
9
10
11
from tesserocr import PyTessBaseAPI

images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']

with PyTessBaseAPI() as api:
for img in images:
api.SetImageFile(img)
print(api.GetUTF8Text())
print(api.AllWordConfidences())
# api is automatically finalized when used in a with-statement (context manager).
# otherwise api.End() should be explicitly called when it's no longer needed.
  • Advanced API Examples

1) GetComponentImages example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from PIL import Image
from tesserocr import PyTessBaseAPI, RIL

image = Image.open('/usr/src/tesseract/testing/phototest.tif')
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print('Found {} textline image components.'.format(len(boxes)))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print(u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}".format(i, conf, ocrResult, **box))

2) Orientation and script detection (OSD):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from PIL import Image
from tesserocr import PyTessBaseAPI, PSM

with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
image = Image.open("/usr/src/tesseract/testing/eurotext.tif")
api.SetImage(image)
api.Recognize()

it = api.AnalyseLayout()
orientation, direction, order, deskew_angle = it.Orientation()
print("Orientation: {:d}".format(orientation))
print("WritingDirection: {:d}".format(direction))
print("TextlineOrder: {:d}".format(order))
print("Deskew angle: {:.4f}".format(deskew_angle))

or simply with OSD_ONLY page segmentation mode:

1
2
3
4
5
6
7
8
from tesserocr import PyTessBaseAPI, PSM

with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:
api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")

os = api.DetectOS()
print("Orientation: {orientation}\nOrientation confidence: {oconfidence}\n"
"Script: {script}\nScript confidence: {sconfidence}".format(**os))

more human-readable info with tesseract 4+ (with LSTM engine):

1
2
3
4
5
6
7
8
from tesserocr import PyTessBaseAPI, PSM, OEM

with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:
api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")

os = api.DetectOrientationScript()
print("Orientation: {orient_deg}\nOrientation confidence: {orient_conf}\n"
"Script: {script_name}\nScript confidence: {script_conf}".format(**os))

3) Iterator over the classifier choices for a single symbol:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from __future__ import print_function

from tesserocr import PyTessBaseAPI, RIL, iterate_level

with PyTessBaseAPI() as api:
api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')
api.SetVariable("save_blob_choices", "T")
api.SetRectangle(37, 228, 548, 31)
api.Recognize()

ri = api.GetIterator()
level = RIL.SYMBOL
for r in iterate_level(ri, level):
symbol = r.GetUTF8Text(level) # r == ri
conf = r.Confidence(level)
if symbol:
print(u'symbol {}, conf: {}'.format(symbol, conf), end='')
indent = False
ci = r.GetChoiceIterator()
for c in ci:
if indent:
print('\t\t ', end='')
print('\t- ', end='')
choice = c.GetUTF8Text() # c == ci
print(u'{} conf: {}'.format(choice, c.Confidence()))
indent = True
print('----------')

, — Jan 18, 2020

Search

    Made with ❤️ and ☀️ on Earth.