-
公开(公告)号:KR1020030035261A
公开(公告)日:2003-05-09
申请号:KR1020010067244
申请日:2001-10-30
Abstract: PURPOSE: A method for selectively extracting the web page information using structure analysis is provided to extract the specific information selectively by analyzing a structure of a web page provided from an information providing web site. CONSTITUTION: After collecting the web page from the information providing web site and searching a layout structure pattern of the collected web page, the unnecessary information is eliminated by performing the structure filtering to the web page as using the information for the layout structure pattern(306). A table structure of the filtered web page is analyzed and a template pattern having the most similar structure with the analyzed table structure is searched from the stored template patterns(310). The specific information is extracted from the filtered page by using the information of the searched template pattern(312).
Abstract translation: 目的:提供一种使用结构分析来选择性地提取网页信息的方法,通过分析从信息提供网站提供的网页的结构来选择性地提取特定信息。 规定:在从信息提供网站收集网页并搜索收集的网页的布局结构模式之后,通过使用布局结构模式的信息(306)对网页进行结构过滤来消除不必要的信息 )。 分析过滤的网页的表结构,并从存储的模板图案(310)中搜索具有与分析的表格结构最相似的结构的模板图案(310)。 通过使用搜索到的模板图案的信息(312)从经过滤页面提取特定信息。