Invention Grant
- Patent Title: Method and system for page construct detection based on sequential regularities
-
Application No.: US14140075Application Date: 2013-12-24
-
Publication No.: US09672195B2Publication Date: 2017-06-06
- Inventor: Hervé Déjean
- Applicant: Xerox Corporation
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/21
- IPC: G06F17/21 ; G06F17/22

Abstract:
Disclosed is a method and system that generates a page construct structure associated with a sequentially-ordered set of pages, each being characterized by a set of page construct features. N-grams, i.e., a sequence of n features, are computed from a set of page construct features for n contiguous pages, and n-grams which are repetitive are selected. Pages matching the most frequent repetitive n-ram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
Public/Granted literature
- US20150178256A1 METHOD AND SYSTEM FOR PAGE CONSTRUCT DETECTION BASED ON SEQUENTIAL REGULARITIES Public/Granted day:2015-06-25
Information query