Document information extraction for computer manipulation

Invention Grant

US11436852B2 Document information extraction for computer manipulation 有权

Please log in to see more content

Patent Title: Document information extraction for computer manipulation
Application No.: US16941349

Application Date: 2020-07-28
Publication No.: US11436852B2

Publication Date: 2022-09-06
Inventor: Ranadeep Bhuyan , Shubhajit Saha , Sudipto Ghosh
Applicant: Intuit Inc.
Applicant Address: US CA Mountain View
Assignee: Intuit Inc.
Current Assignee: Intuit Inc.
Current Assignee Address: US CA Mountain View
Agency: Paradice and Li LLP
Main IPC: G06K9/00
IPC: G06K9/00 ; G06V30/416 ; G06K9/62 ; G06T11/60 ; G06T7/11 ; G06V10/75 ; G06V30/10

Document information extraction for computer manipulation

Abstract:

Systems and apparatuses are disclosed for extracting information from document images. An example method includes segmenting a document image into multiple segments and determining formatting information for each segment. Determining formatting information for a segment includes determining one or more features of the segment and comparing the one or more features of the segment to one or more clusters of features associated with different document types. The formatting information for the segment is based on the comparison. The method also includes, for each segment, storing the formatting information in a data structure associated with the segment. The method further includes, for each segment including text to be identified during information extraction, applying OCR to the segment to generate machine-encoded text and storing the machine-encoded text in the associated data structure.

Public/Granted literature

US20220036063A1 DOCUMENT INFORMATION EXTRACTION FOR COMPUTER MANIPULATION Public/Granted day:2022-02-03

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )