Invention Grant
US07937653B2 Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents 有权
用于在传统文档中检测包括标题和页脚的分页结构的方法和装置

Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
Abstract:
A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.
Information query
Patent Agency Ranking
0/0