Automated document extraction and classification

Invention Grant

US10977291B2 Automated document extraction and classification 有权

Please log in to see more content

Patent Title: Automated document extraction and classification
Application No.: US16054781

Application Date: 2018-08-03
Publication No.: US10977291B2

Publication Date: 2021-04-13
Inventor: Ronnie Douglas Douthit , Deepankar Mohapatra , Ram Mohan Shamanna , Chiranjeev Jagannadha Reddy , Yexin Huang , Trichur Shivaramakrishnan Subramanian , Chinnadurai Duraisami , Karpaga Ganesh Patchirajan , Amar J. Mattey
Applicant: Ronnie Douglas Douthit , Deepankar Mohapatra , Ram Mohan Shamanna , Chiranjeev Jagannadha Reddy , Yexin Huang , Trichur Shivaramakrishnan Subramanian , Chinnadurai Duraisami , Karpaga Ganesh Patchirajan , Amar J. Mattey
Applicant Address: US TX Frisco; US TX The Colony; US TX Frisco; US TX Frisco; US TX Plano; US TX McKinney; US TX Plano; US TX Plano; US TX Frisco
Assignee: Ronnie Douglas Douthit,Deepankar Mohapatra,Ram Mohan Shamanna,Chiranjeev Jagannadha Reddy,Yexin Huang,Trichur Shivaramakrishnan Subramanian,Chinnadurai Duraisami,Karpaga Ganesh Patchirajan,Amar J. Mattey
Current Assignee: Ronnie Douglas Douthit,Deepankar Mohapatra,Ram Mohan Shamanna,Chiranjeev Jagannadha Reddy,Yexin Huang,Trichur Shivaramakrishnan Subramanian,Chinnadurai Duraisami,Karpaga Ganesh Patchirajan,Amar J. Mattey
Current Assignee Address: US TX Frisco; US TX The Colony; US TX Frisco; US TX Frisco; US TX Plano; US TX McKinney; US TX Plano; US TX Plano; US TX Frisco
Agency: Ferguson Braswell Fraser Kubasta PC
Main IPC: G06F7/00
IPC: G06F7/00 ; G06F16/35 ; G06N5/02 ; G06Q40/00 ; G06F16/93

Automated document extraction and classification

Abstract:

A method including receiving a source file containing a plurality of documents which, to a computer, initially are indistinguishable from each other. A first classification stage is applied to the source file using a convolutional neural network image classification to identify source documents in the multitude of documents and to produce a partially parsed file having a multitude of identified source documents. The partially parsed file includes sub-images corresponding to the plurality of identified source documents. A second classification stage, including a natural language processing artificial intelligence, is applied to sets of text in bounding boxes of the sub-images, to classify each of the multitude of identified source documents as a corresponding sub-type of document. Each of the sets of text corresponding to one of the sub-images. A parsed file having a multitude of identified sub-types of documents is produced. The parsed file is further computer processed.

Public/Granted literature

US20200042645A1 AUTOMATED DOCUMENT EXTRACTION AND CLASSIFICATION Public/Granted day:2020-02-06

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）