Advanced Deep Learning OCR Settings

Advanced Deep Learning OCR Settings

Configure Advanced Deep Learning OCR settings such as character gap percentage, vertical misalignment percentage, minimum characters to create a line, flatten, and grammar rules.
Advanced Deep Learning OCR Settings
Setting
Description
Maximum Character Gap Percentage
The maximum horizontal gap between joint characters' boxes is denoted as a percentage of A character height.
Maximum Vertical Misalignment Percentage
The maximum vertical misalignment between joint character boxes is denoted as a percentage of A character height.
Minimum Characters to Create a Line
Determine the minimum number of characters to create a line.
Default value: 1
A line is commonly referred to as a Block or Word for the utilization of this tool.
Flatten
If true, this feature concatenates the words on the line into a single result string. Otherwise, each word is a separate result string.
Grammar Rules
Use grammar rules to check the structure of the text for grammar, character, and formatting constraints, and abbreviations or acronyms.
Pattern Elements detected by Grammar Rules can be:
  • Individual Characters - Escaping operational characters: Use a backslash to treat operational characters as normal: \\, \*, \?, \., \+, \-, \], \[, \), \(.
  • Character Class - a set of characters enclosed within square brackets []. It allows you to match any one character from the specified set. For example:
    • List of characters: [abc] Matches any one of the characters a, b, or c.
    • Range: [a-z] Matches any one of the characters from a to z.
    • Mix of them: [a-zA-Z12] Matches any one of the characters from a to z, A to Z, and 1,2.
    • Predefined character classes:
      • \d is equivalent to [0-9]
      • \w corresponds to [a-zA-Z0-9_]
      • . (dot) matches any single character (\w plus special characters )
    Inside a character class, the following characters require escaping with a backslash //,\\, \-, \]. For example, [a.*|] is a valid pattern that matches the characters: a,.,*,|.
  • Chain - an extended string created by concatenating individual characters and character classes. For example:
    • abc - matches text abc
    • [Aa]bc - matches texts: abc and Abc
    • \dabc - matches texts: 0abc, 1abc, ..., 9abc
  • Alternative - used to match one pattern or another. It's a sequence of chains separated by the pipe symbol | in round brackets (). For example:
  • Special Operators - can modify or repeat the preceding expression. For example:
    • * (star): means zero or more occurrences of the preceding expression (in particular, ".*" means any sequence), but tries to match as many characters as possible.
    • + (plus): means one or more occurrences of the preceding element, maximizing the number of characters matched.
    • ? (question mark): means zero or one occurrence of the preceding element, with a preference for one.
    • *? (lazy star): means zero or more occurrences of the preceding expression, but tries to match as few characters as possible.
    • +? (lazy plus): means one or more occurrences, but minimizes the number of characters matched.
    Special operators can be added after the pattern element; for example: [ABC]*, [0-9]?, (ABC|DEF)+. However, special operators cannot be used inside alternative pattern elements.