Automatic document separation
Abstract:
In an approach for an automatic document separation, a processor extracts one or more features from a document containing a plurality of pages. A processor generates a data frame based on the feature extraction. In response to analyzing a similarity between the plurality of pages, a processor determines whether the similarity exceeds a predetermined threshold. In response to determining that the similarity does not exceed the predetermined threshold, a processor transforms text into vectors forming float arrays. In response to benchmarking a set of predetermined clustering algorithms, a processor identifies a clustering algorithm using a predetermined criterion. A processor clusters the plurality of pages, using the clustering algorithm, to create a group of pages. A processor validates the clustered group of pages. In response to passing validation, a processor generates a set of final separated files based on the clustered group of pages.
Public/Granted literature
Information query
Patent Agency Ranking
0/0