그리드 컴퓨팅 기반 웹 크롤러 시스템 및 그 방법
    1.
    发明授权
    그리드 컴퓨팅 기반 웹 크롤러 시스템 및 그 방법 失效
    基于网格计算的WEB抓取系统及其方法

    公开(公告)号:KR100875636B1

    公开(公告)日:2008-12-26

    申请号:KR1020070095444

    申请日:2007-09-19

    CPC classification number: G06F17/30864

    Abstract: A grid computing based web crawler system and a method thereof are presented to select grid computing resource with lowest cost by considering geographical position of a web page. According to a grid computing based web crawling method, a surface web crawler service instance is dynamically generated by calling a service web crawler service factory(151) to perform surface web crawling of a corresponding web page when the web page is a surface web and then an index of the web page is generated. When the web page is a deep web, a deep web crawler service instance is dynamically generated by calling a deep web crawler service factory(152) to search a deep web search form in the corresponding web page and then the deep web search form is extracted from the deep web crawler service instance. A result page is generated by inputting a query to the deep web search form and then an index of a page is generated by extracting a keyword of the result page, to be returned to a caller.

    Abstract translation: 提出了一种基于网格计算的网络爬虫系统及其方法,通过考虑网页的地理位置,以最低成本选择网格计算资源。 根据基于网格计算的网络爬行方法,通过调用服务网络爬虫服务工厂(151)来动态地生成表面网络爬网程序服务实例,以便当网页是表面网页时执行相应网页的表面网页抓取,然后 生成网页的索引。 当网页是深度网页时,通过调用深层网络爬虫服务工厂(152)来动态地生成深层网络爬虫服务实例,以搜索相应网页中的深层网页搜索表单,然后提取深层网页搜索表单 来自深层网页抓取工具服务实例。 通过向深层网页搜索表单输入查询来生成结果页面,然后通过提取结果页面的关键字来生成页面的索引,以返回给调用者。

Patent Agency Ranking