Invention Grant
US07991709B2 Method and apparatus for structuring documents utilizing recognition of an ordered sequence of identifiers
有权
用于利用识别标识符的有序序列的识别来构造文档的方法和装置
- Patent Title: Method and apparatus for structuring documents utilizing recognition of an ordered sequence of identifiers
- Patent Title (中): 用于利用识别标识符的有序序列的识别来构造文档的方法和装置
-
Application No.: US12020743Application Date: 2008-01-28
-
Publication No.: US07991709B2Publication Date: 2011-08-02
- Inventor: Herve Dejean , Jean-Luc Meunier
- Applicant: Herve Dejean , Jean-Luc Meunier
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F15/18
- IPC: G06F15/18 ; G06F17/21 ; G06F17/22 ; G06F17/27

Abstract:
A method is provided for operating a computing device to create a document structure model of a computer parsable text document utilizing recognition of at least one ordered sequence of identifiers in the document. The method includes converting a computer parsable text document of any format to an alternative structured language format to form a converted document. The text of the converted document is fragmented into an ordered sequence of text fragments within a text format. The text fragments are enumerated to obtain a sequence of terms. At least one optimal sub-sequence of terms is identified from among the sequence of terms, with an optimal sub-sequence being one or more longest increasing sub-sequence(s). The computer parsable text document is annotated with tags, with the tags including information derived from identification of the optimal sub-sequence(s). The annotated document is displayed on the graphical user interface.
Public/Granted literature
Information query