Invention Grant
- Patent Title: Systems and methods for client-based web crawling
- Patent Title (中): 用于基于客户端的网络爬网的系统和方法
-
Application No.: US10670681Application Date: 2003-09-25
-
Publication No.: US07685296B2Publication Date: 2010-03-23
- Inventor: Eric D. Brill , Christopher A. Meek
- Applicant: Eric D. Brill , Christopher A. Meek
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Lee & Hayes, PLLC
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06F9/00

Abstract:
The present invention provides systems and methods for obtaining information from a networked system utilizing a distributed web crawler. The distributed nature of clients of a server is leveraged to provide fast and accurate web crawling data. Information gathered by a server's web crawler is compared to data retrieved by clients of the server to update the crawler's data. In one instance of the present invention, data comparison is achieved by utilizing information disseminated via a search engine results page. In another instance of the present invention, data validation is accomplished by client dictionaries, emanating from a server, that summarize web crawler data. The present invention also facilitates data analysis by providing a means to resist spoofing of a web crawler to increase data accuracy.
Public/Granted literature
- US20050071766A1 Systems and methods for client-based web crawling Public/Granted day:2005-03-31
Information query