Invention Grant
- Patent Title: Systems and methods for automated end-to-end text extraction of electronic documents
-
Application No.: US17850618Application Date: 2022-06-27
-
Publication No.: US12217524B2Publication Date: 2025-02-04
- Inventor: Keerthan Ramnath , Punitha Chandrasekar , Hui Su , Shyam Subramanian , Rachna Saxena , Mohamed Mahdi Alouane , Vinay Iyengar
- Applicant: FMR LLC
- Applicant Address: US MA Boston
- Assignee: FMR LLC
- Current Assignee: FMR LLC
- Current Assignee Address: US MA Boston
- Agency: Cesari and McKenna, LLP
- Main IPC: G06V30/414
- IPC: G06V30/414 ; G06F40/232 ; G06F40/263 ; G06F40/284

Abstract:
Systems and methods for extracting data from electronic documents using optical character recognition (OCR) and non-OCR based text extraction. A server computing device initiates non-OCR based text extraction for each page of an electronic document. The server calculates a document text coverage percentage corresponding to the non-OCR based text extraction for the whole document and, in response to determining that the document text coverage percentage is below a first threshold, initiates OCR for the document. The server calculates a page text coverage percentage corresponding to the non-OCR based text extraction for one or more pages of the electronic document and, in response to determining that the page text coverage percentage is below a second threshold, initiates OCR for the pages. The server combines first text extracted from the electronic document using non-OCR based text extraction and second text extracted from the electronic document using OCR.
Public/Granted literature
- US20230419711A1 SYSTEMS AND METHODS FOR AUTOMATED END-TO-END TEXT EXTRACTION OF ELECTRONIC DOCUMENTS Public/Granted day:2023-12-28
Information query