How to OCR PDFs and PDF Portfolios

Foxit PDF editor

This article will tell you how to OCR PDFs and PDF portfolios with Foxit PDF Editor.

OCR PDFs and PDF Portfolios

Optical Character Recognition, or OCR, is a software process that enables images or printed text to be translated into machine-readable text. OCR is most commonly used when scanning paper documents to create electronic copies, but can also be performed on existing electronic documents (e.g. PDF or PDF portfolio).

Recognize text

Foxit PDF Editor can detect whether a PDF file is scanned or image-based and make corresponding suggestions to initiate OCR when opening a scanned or image-based PDF. You can also run OCR anytime to recognize the image-based text in a PDF.

To recognize image-based or scanned text in a PDF file, perform the following steps:

1.        Click Convert > Recognize Text > Current File, in the Recognize Text dialog box, specify the page range you need.

2.        Choose the language used in your document. You can select multiple languages as well.

3.        In the output type, check Searchable Text Image to make the image text selectable and searchable (or check Editable Text to enable the image text to be edited with Foxit PDF Editor). Then click OK to recognize the text.

·         Searchable Text Image: During the OCR process, Foxit PDF Editor analyzes the image text and substitutes words/characters that closely approximates the image text. The substitute words/characters will be placed on an invisible layer of text in the PDF, which makes the image text selectable and searchable. If the substitution is uncertain, the text will be marked as OCR suspects which need to be corrected manually.

·         Editable Text: During the OCR process, Foxit PDF Editor compares the shape of the image text to the approximate fonts installed on your system, and turns the image text into editable text.

Note: If you are prompted to download the OCR component after clicking OK, please click Yes to download and install it, or download it later from the link provided and install it by clicking Install Plugin in the About Foxit Plug-Ins dialog box which pops up when you click Foxit Plug-Ins in the Help tab. To get the full version of Foxit PDF Editor, please contact us.

4.        (Optional) If you check Find All Suspect (Show all OCR results that may need to be changed.), the OCR Suspects dialog box pops up for you to check and correct OCR suspects right after the recognition completes. To learn how to correct OCR suspects, please refer to the instructions on “Find and Correct OCR Suspects”.

If you choose Editable Text in the output type, with the Find All Suspect (Show all OCR results that may need to be changed.) optionselected, the OCRed text that Foxit PDF Editor is not certain about will be marked as OCR suspects, and the original image text will be kept until you manually handle all the OCR suspects. You can also deselect this option to turn the image text into editable text with no OCR suspects after recognition. And you can modify the text directly using the commands in the Edit tab.

5.        (Optional) If you select Editable Text in Step 3, the Recognize the line segments as path objects in the PDF option is available. If the image text in your document contains tables, selecting this option helps better recognize the line segments, but it may take longer to complete recognition.   

6.        A recognition text process bar will pop up to show the progress.

7.        Do the search function, the text on your image or scanned document will be searchable.

Tip: Foxit PDF Editor provides the Quick Recognition command under Home/Convert tab to recognize all pages of a scanned or image-based PDF with default or previous settings by one-click.

To recognize text in multiple files:

1.      Click Convert > Recognize Text > Multiple Files.

2.      In the Recognize Text dialog box, click Add Files to add files, folders, or currently opened files. Use Move up, Move down, and Remove to adjust the order of the files.

3.      Click Output Options…. In the Output Options dialog box, select the destination folder, choose how to name the new file and whether to overwrite an existing one, and then click OK.

4.      Click OK. After recognition, a message box will pop up to prompt you the recognition is finished.

Note:

1.       When you are using the CJK OCR engine for the first time, the system will remind you to download and install the engine from the Foxit server.

2.       If there is any unsupported file added, a “Remove unsupported file(s)” button will appear in the Recognize Text dialog box. Click the button to remove the unsupported file(s) and then continue. While recognizing a PDF portfolio, Foxit PDF Editor will only extract and recognize PDF files in the portfolio.

Leave a Reply

Your email address will not be published. Required fields are marked *

7  +  1  =