Invention Grant
- Patent Title: Method and apparatus for web crawling
- Patent Title (中): 网络爬行的方法和装置
-
Application No.: US12413528Application Date: 2009-03-28
-
Publication No.: US08712992B2Publication Date: 2014-04-29
- Inventor: Alexey Maykov , Matthew F. Hurst
- Applicant: Alexey Maykov , Matthew F. Hurst
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agent Steve Spellman; Jim Ross; Micky Minhas
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F7/08

Abstract:
A method and system for retrieving data from a webpage is described herein. A scheduler organizes, or rather orders, a group of webpage identifiers according to some predetermined criteria. Based upon this ordering, a fetcher may be configured to fetch data from webpages identified by the identifiers. To promote efficiency and reduce the latency between when a webpage is updated and when the fetcher retrieves data from the webpage, the scheduler may be configured to reorder the identifiers in such a manner that it causes an identifier that was less relevant, and would not have been sent to the fetcher, to become more relevant. In this way, the method and system may be particularly useful for retrieving data related to webpages that are updated frequently, such as social media webpages, for example.
Public/Granted literature
- US20100250516A1 METHOD AND APPARATUS FOR WEB CRAWLING Public/Granted day:2010-09-30
Information query