Invention Grant
- Patent Title: Systems and methods of web crawling
- Patent Title (中): 网络抓取的系统和方法
-
Application No.: US13942812Application Date: 2013-07-16
-
Publication No.: US09576052B2Publication Date: 2017-02-21
- Inventor: Nidhi Singh , Jean-Marc Coursimault , Herve Poirier , Nicolas Monet
- Applicant: Xerox Corporation
- Applicant Address: US CT Norwalk
- Assignee: XEROX CORPORATION
- Current Assignee: XEROX CORPORATION
- Current Assignee Address: US CT Norwalk
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Methods and systems for dynamically training a web crawler. The web crawler maintains one or more categories each comprising a set of words. The method includes selecting at least one hyperlink in response to a query received from a user. The method further includes determining a hyperlink score for the at least one hyperlink based on a category score associated with each of one or more categories. The category score associated with each of the one or more categories is updated based at least in part on the hyperlink score. The updated category score is compared with the hyperlink score to select a category from the one or more categories. The set of words associated with the category is updated based on content of a web page pointed by the at least one hyperlink.
Public/Granted literature
- US20150026152A1 SYSTEMS AND METHODS OF WEB CRAWLING Public/Granted day:2015-01-22
Information query