Invention Grant
US07965891B2 System and method for identifying and labeling fields of text associated with scanned business documents
有权
用于识别和标记与扫描的业务文档相关的文本字段的系统和方法
- Patent Title: System and method for identifying and labeling fields of text associated with scanned business documents
- Patent Title (中): 用于识别和标记与扫描的业务文档相关的文本字段的系统和方法
-
Application No.: US12710573Application Date: 2010-02-23
-
Publication No.: US07965891B2Publication Date: 2011-06-21
- Inventor: John C. Handley , M. Armon Rahgozar , Dennis L. Venable , Pamela B. Spiteri , Anoop M. Namboodiri , Richard Zanibbi
- Applicant: John C. Handley , M. Armon Rahgozar , Dennis L. Venable , Pamela B. Spiteri , Anoop M. Namboodiri , Richard Zanibbi
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Main IPC: G06K9/34
- IPC: G06K9/34

Abstract:
A system for electronically distilling information from a business document uses a network scanner to electronically scan a platen area, having a business document thereon, to create a bitmap. A network server carries out a segmentation process to segment the scan generated bitmap into a bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document generation process to convert the structured representation into a structure text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable parsing of the generated string of terminal symbols.
Public/Granted literature
- US20100149606A1 SYSTEM AND METHOD FOR IDENTIFYING AND LABELING FIELDS OF TEXT ASSOCIATED WITH SCANNED BUSINESS DOCUMENTS Public/Granted day:2010-06-17
Information query