Highlighted
Super Contributor.. Erik Wold Super Contributor..
Super Contributor..
98 views

OCR PDF file with image-text AND text, automatically

If you enable "Automatically create OCR rendition" for a record type, then add some scanned PDF files, an OCR rendition is created. Perfect, love it.  If you combine a scanned PDF with some text searchable PDFs into a single file and add it to CM, the scanned PDF pages are overlooked and only the text searchable words become content indexed, aka, it's not an Image only file as the Record Type config indicates to generate the OCR rendition.

Does anyone know of a config that might need to change to make sure IDOL (in this case) won't skip over those scanned pages so that the automatic OCR rendition gets generated (for the whole document, or just the image-text pages, either way)? 

We have confirmed that the manual process works to generate the OCR rendition, but it would be perfect to have the process work upon ingestion with those hybrid type PDFs that aren't all image, or all text.

We all know the use case where these combination PDFs are made - scanned signature page attached to a search able text pdf.

ideas? 

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.