Invention Grant
- Patent Title: Automatic document separation
-
Application No.: US17032046Application Date: 2020-09-25
-
Publication No.: US11295175B1Publication Date: 2022-04-05
- Inventor: Abisola Adeniran , Aisha Aliyu , Qingxue Xu
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Sonny Z. Zhan
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06K9/62 ; G06V30/414 ; G06V30/418 ; G06V30/10

Abstract:
In an approach for an automatic document separation, a processor extracts one or more features from a document containing a plurality of pages. A processor generates a data frame based on the feature extraction. In response to analyzing a similarity between the plurality of pages, a processor determines whether the similarity exceeds a predetermined threshold. In response to determining that the similarity does not exceed the predetermined threshold, a processor transforms text into vectors forming float arrays. In response to benchmarking a set of predetermined clustering algorithms, a processor identifies a clustering algorithm using a predetermined criterion. A processor clusters the plurality of pages, using the clustering algorithm, to create a group of pages. A processor validates the clustered group of pages. In response to passing validation, a processor generates a set of final separated files based on the clustered group of pages.
Public/Granted literature
- US20220101065A1 AUTOMATIC DOCUMENT SEPARATION Public/Granted day:2022-03-31
Information query