Invention Grant
US07743327B2 Table of contents extraction with improved robustness 失效
目录提取具有改进的鲁棒性

Table of contents extraction with improved robustness
Abstract:
In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.
Public/Granted literature
Information query
Patent Agency Ranking
0/0