OCR PDF file with image-text AND text, automatically

If you enable "Automatically create OCR rendition" for a record type, then add some scanned PDF files, an OCR rendition is created. Perfect, love it.  If you combine a scanned PDF with some text searchable PDFs into a single file and add it to CM, the scanned PDF pages are overlooked and only the text searchable words become content indexed, aka, it's not an Image only file as the Record Type config indicates to generate the OCR rendition.

Does anyone know of a config that might need to change to make sure IDOL (in this case) won't skip over those scanned pages so that the automatic OCR rendition gets generated (for the whole document, or just the image-text pages, either way)? 

We have confirmed that the manual process works to generate the OCR rendition, but it would be perfect to have the process work upon ingestion with those hybrid type PDFs that aren't all image, or all text.

We all know the use case where these combination PDFs are made - scanned signature page attached to a search able text pdf.

ideas? 

  • I'm a little late to the party....

    There is a setting that determines whether the Media Server will combine text and image sections in a PDF during OCR - but this is set to true by default so I don't know why it doesn't work on your side.  Below is the docs on the setting and also the .cfg

    "/Program Files\Micro Focus\Content Manager\IDOL\TRIM Media Server\TRIM Media Server OCR.cfg"

     

    oct_config.pngprocess_text_el.png

     

    It does sound like a bug though because you said tht when you manually submit a doc it works; only on "Automatically Create PDF Rendition" it fails.