Invention Grant
- Patent Title: Discrepancy detection for web crawling
- Patent Title (中): 网页爬网差异检测
-
Application No.: US12817797Application Date: 2010-06-17
-
Publication No.: US08639773B2Publication Date: 2014-01-28
- Inventor: Balaji B. Shyamkumar , Puneet Sahni , Harsh Verma
- Applicant: Balaji B. Shyamkumar , Puneet Sahni , Harsh Verma
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Microsoft Corporation
- Main IPC: G06F15/16
- IPC: G06F15/16

Abstract:
Search engines may utilize web crawlers to discover desirable content that may be provided to users as search results. Unfortunately, document providers, such as websites, may return junk web pages and/or maintenance web pages as document results, which may be undesirable for a search engine to provide as search results. Accordingly, document providers may be grouped into provider clusters. Profiles may be assigned to provider clusters, where a profile may comprise parameters representing “expected” parameters historically returned from normal document fetch operations to document providers within the provider cluster. Parameters of a profile for a provider cluster comprising a document provider may be compared with current document fetch parameters of a current document fetch operation. If the parameters of the profile and the current document fetch parameters do not match, then an alert may be generated.
Public/Granted literature
- US20110314122A1 DISCREPANCY DETECTION FOR WEB CRAWLING Public/Granted day:2011-12-22
Information query