Invention Grant
US07660804B2 Joint optimization of wrapper generation and template detection 有权
联合优化包装生成和模板检测

Joint optimization of wrapper generation and template detection
Abstract:
A method and system for generating wrappers for hierarchically organized documents by jointly optimizing template detection and wrapper generation is provided. A wrapper generation system generates a wrapper for documents with similar templates by identifying a cluster of document trees and generating a wrapper tree for the cluster. A wrapper tree defines the wrapper for documents that match the template of the cluster. The wrapper generation system clusters document trees by generating a wrapper tree for the cluster based on an initial document tree. The wrapper generation system then repeatedly determines whether any other document tree matches or nearly matches the wrapper tree for the cluster and, if so, adds the document tree to the cluster and adjusts the wrapper tree as appropriate so that all the document trees, including the newly added one, match the wrapper tree.
Public/Granted literature
Information query
Patent Agency Ranking
0/0