System to extract information from documents

Invention Grant

US11379690B2 System to extract information from documents 有权

Please log in to see more content

Patent Title: System to extract information from documents
Application No.: US16794266

Application Date: 2020-02-19
Publication No.: US11379690B2

Publication Date: 2022-07-05
Inventor: Vikas Kumar
Applicant: Infrrd Inc
Applicant Address: US CA San Jose
Assignee: Infrrd Inc
Current Assignee: Infrrd Inc
Current Assignee Address: US CA San Jose
Main IPC: G06K9/00
IPC: G06K9/00 ; G06K9/62 ; G06N20/00 ; G06N5/04 ; G06F40/30 ; G06F40/284 ; G06F40/117 ; G06V10/56 ; G06V30/10

Abstract:

A method of training a system to extract information from documents comprises feeding digital form of training documents to an OCR module, which identifies multiple logical blocks in the documents and text present in the logical blocks. One or more tags for the whole of the document, the logical blocks and word tokens on the document are received by a tagging module. A text input comprising the text identified in the document and the tags for the whole of the document are received by a machine learning module. A first image of the document with layout of the one or more of the identified blocks superimposed, and the tags of the logical blocks in the document are received by the machine learning module, wherein the received text input, first image and tags for the logical blocks corresponds to a plurality of the training documents.

Public/Granted literature

US20200184267A1 SYSTEM TO EXTRACT INFORMATION FROM DOCUMENTS Public/Granted day:2020-06-11

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )