Improving OCR accuracy involves optimizing scanned documents through pre-processing and cleanup. Pre-processing includes using adequate spacing, limiting lines and colors, and OCR-friendly fonts. During scanning, images should be cleaned up through techniques like adaptive thresholding, despeckling, and removing blank pages. Intelligent capture solutions like ImageRamp can enhance images for improved OCR accuracy through settings validation and optimization. Proper document handling and cleanup can be as important as scanning technologies for achieving high OCR accuracy needed in applications like healthcare and legal.
6. Form Design
• adequate white
space
• limited lines
Font Selection
• monospace like
Courier or san serif
fonts like Helvetica
• at least 10-13
points
Color Selection
• limited use of color
Set pre-processing standards and
procedures
10. Adaptive thresholding assists in cleaning “dirty” documents or documents with
a colored background which interferes with the foreground data.
Adaptive Thresholding
11. Adaptive thresholding assists in cleaning “dirty” documents or documents with
a colored background which interferes with the foreground data.
Adaptive Thresholding
Most scanner and capture software can apply basic thresholding technology.
12. Adaptive Thresholding
ImageRamp uses Adaptive Thresholding with advanced algorithms and
Sensitivity settings allowing you to optimize the thresholding for your
documents.
13. This option smoothes the edging of text. Smoothing text fills small pits in the
edges of a character and removes small bumps on the edges. This improves
legibility and reduce storage needs.
Smooth Text
14. Dither Form Fills
Black and white printed images may use dithering, often called dot shading, to
simulate shades of gray by varying the patterns of dots. The Dither Form Fills
feature removes areas of dot shading from an image. This function is used to
make a black and white TIFF image appear as black and white and not a
grayscale image.
15. This searches and resizes the document based on the outermost located raster
data or pixels.
Reset Margins
16. Using detected text as the basis for alignment, this tool is designed to work with
scanned office documents and eliminate rescans.
Deskew or
Straighten Page
17. This selection detects and removes lines which may interfere with OCR
interpretation.
Remove Lines
18. Whether your scanned image is contaminated or a bad original, this option
removes extraneous black specks and fills in white holes on black areas of an
image.
Remove Noise or
Despeckle
19. Auto Rotate automatically evaluates orientation based on the text and rotates
misoriented pages. Optionally, select a degree of rotation for ImageRamp to
rotate all pages based on the selection.
Auto Rotate and
Rotate Pages
20. This can be used to eliminate unnecessary blank pages in a document and make
the file size smaller. Blank page detection can also play a role in file splitting.
Many users divide documents in a scanning stack with blank pages and
ImageRamp can be set to split the stack of documents into multiple files when
blanks are detected.
Remove Blank
Pages
21. Besides cleaning and enhancing
the image, ImageRamp has other
ways to improve OCR accuracy.
22. OCR with validation during processing is a very powerful way to
eliminate entries not meeting a specific format rule.
For instance if an inventory item should contain three alpha
characters followed by five numbers, all documents with item
numbers that are not identified in the OCR process with that pattern
may be tagged for manual inspection before further processing is
done.
Field Validation Improves Accuracy.
PEN21096
CAP36581
INV98453
PA568793
23. ImageRamp offers
significant preview and
testing options to fine-
tune settings.
Additionally
ImageRamp offers PDF
or TIFF output which
may differ in OCR
accuracy.
25. Pre-Processing Standards
Encourage accuracy by setting document procedures
and guidelines to:
Good pre-processing can be as important as the scanning technologies.
• Use adequate white space
• Limit lines and gridlines
• Limit the use of color
• Use OCR friendly fonts and sizes