Invention Grant
- Patent Title: Joint optimization of wrapper generation and template detection
- Patent Title (中): 联合优化包装生成和模板检测
-
Application No.: US11465026Application Date: 2006-08-16
-
Publication No.: US07660804B2Publication Date: 2010-02-09
- Inventor: Ji-Rong Wen , Min Wan , Ruihua Song , Wei-Ying Ma , Shuyi Zeng
- Applicant: Ji-Rong Wen , Min Wan , Ruihua Song , Wei-Ying Ma , Shuyi Zeng
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Perkins Coie LLP
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/00

Abstract:
A method and system for generating wrappers for hierarchically organized documents by jointly optimizing template detection and wrapper generation is provided. A wrapper generation system generates a wrapper for documents with similar templates by identifying a cluster of document trees and generating a wrapper tree for the cluster. A wrapper tree defines the wrapper for documents that match the template of the cluster. The wrapper generation system clusters document trees by generating a wrapper tree for the cluster based on an initial document tree. The wrapper generation system then repeatedly determines whether any other document tree matches or nearly matches the wrapper tree for the cluster and, if so, adds the document tree to the cluster and adjusts the wrapper tree as appropriate so that all the document trees, including the newly added one, match the wrapper tree.
Public/Granted literature
- US20080046441A1 JOINT OPTIMIZATION OF WRAPPER GENERATION AND TEMPLATE DETECTION Public/Granted day:2008-02-21
Information query