Automatic document separation

Invention Grant

US11295175B1 Automatic document separation 有权

Please log in to see more content

Patent Title: Automatic document separation
Application No.: US17032046

Application Date: 2020-09-25
Publication No.: US11295175B1

Publication Date: 2022-04-05
Inventor: Abisola Adeniran , Aisha Aliyu , Qingxue Xu
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agent Sonny Z. Zhan
Main IPC: G06K9/00
IPC: G06K9/00 ; G06K9/62 ; G06V30/414 ; G06V30/418 ; G06V30/10

Abstract:

In an approach for an automatic document separation, a processor extracts one or more features from a document containing a plurality of pages. A processor generates a data frame based on the feature extraction. In response to analyzing a similarity between the plurality of pages, a processor determines whether the similarity exceeds a predetermined threshold. In response to determining that the similarity does not exceed the predetermined threshold, a processor transforms text into vectors forming float arrays. In response to benchmarking a set of predetermined clustering algorithms, a processor identifies a clustering algorithm using a predetermined criterion. A processor clusters the plurality of pages, using the clustering algorithm, to create a group of pages. A processor validates the clustered group of pages. In response to passing validation, a processor generates a set of final separated files based on the clustered group of pages.

Public/Granted literature

US20220101065A1 AUTOMATIC DOCUMENT SEPARATION Public/Granted day:2022-03-31

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )