Invention Grant
- Patent Title: Table of contents extraction with improved robustness
- Patent Title (中): 目录提取具有改进的鲁棒性
-
Application No.: US11360963Application Date: 2006-02-23
-
Publication No.: US07743327B2Publication Date: 2010-06-22
- Inventor: Jean-Luc Meunier , Hervé Déjean
- Applicant: Jean-Luc Meunier , Hervé Déjean
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/21
- IPC: G06F17/21

Abstract:
In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.
Public/Granted literature
- US20070196015A1 Table of contents extraction with improved robustness Public/Granted day:2007-08-23
Information query