System and method for extracting structured information from image documents

Invention Grant

US10853638B2 System and method for extracting structured information from image documents 有权

Please log in to see more content

Patent Title: System and method for extracting structured information from image documents
Application No.: US16173760

Application Date: 2018-10-29
Publication No.: US10853638B2

Publication Date: 2020-12-01
Inventor: Abhisek Mukhopadhyay , Shubhashis Sengupta
Applicant: Accenture Global Solutions Limited
Applicant Address: IE Dublin
Assignee: Accenture Global Solutions Limited
Current Assignee: Accenture Global Solutions Limited
Current Assignee Address: IE Dublin
Agency: Plumsea Law Group, LLC
Priority: IN201841032793 20180831
Main IPC: G06K9/00
IPC: G06K9/00 ; G06Q50/18 ; G06Q40/08 ; G06F40/279

Abstract:

A system and method for extracting structured information from image documents is disclosed. An input image document is obtained, and the input image document may be analyzed to determine a skeletal layout of information included in the input image document. A measure of similarity between the determined skeletal layout and each of the document templates may be determined. A document template may be selected as a matched template, based on the determined measure of similarity. Box areas from the input image document may be cropped out, and optical character recognition (OCR) may be performed on the box areas. Obtained recognized text may be automatically processed using directed search to correct errors made by the OCR. Statistical language modeling may be used to classify the input image document into a classification category, and the classified input image document may be processed according to the classification category.

Public/Granted literature

US20200074169A1 System And Method For Extracting Structured Information From Image Documents Public/Granted day:2020-03-05

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )