Invention Grant
US09224041B2 Table of contents extraction based on textual similarity and formal aspects 有权
基于文本相似性和形式方面的目录提取

Table of contents extraction based on textual similarity and formal aspects
Abstract:
An initial organizational table for a document is determined based on textual similarity between entries of the organizational table and target text fragments and not taking into account text formatting. A classifier is trained to identify text fragment pairs consisting of entries of the organizational table and corresponding target text fragments based at least in part on text formatting features. The training employs a training set of examples annotated based on the initial organizational table. The initial organizational table is updated using the trained classifier.
Information query
Patent Agency Ranking
0/0