Invention Grant
US07937653B2 Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
有权
用于在传统文档中检测包括标题和页脚的分页结构的方法和装置
- Patent Title: Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
- Patent Title (中): 用于在传统文档中检测包括标题和页脚的分页结构的方法和装置
-
Application No.: US11032817Application Date: 2005-01-10
-
Publication No.: US07937653B2Publication Date: 2011-05-03
- Inventor: Hervé Déjean , Jean-Luc Meunier
- Applicant: Hervé Déjean , Jean-Luc Meunier
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/00
- IPC: G06F17/00

Abstract:
A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.
Public/Granted literature
Information query