Invention Grant
- Patent Title: Optimized web domains classification based on progressive crawling with clustering
- Patent Title (中): 基于逐步爬行与聚类优化的网域分类
-
Application No.: US13732860Application Date: 2013-01-02
-
Publication No.: US08972376B1Publication Date: 2015-03-03
- Inventor: Renars Gailis , Lin Xu , Renzo Lazzarato
- Applicant: Palo Alto Networks, Inc.
- Applicant Address: US CA Santa Clara
- Assignee: Palo Alto Networks, Inc.
- Current Assignee: Palo Alto Networks, Inc.
- Current Assignee Address: US CA Santa Clara
- Agency: Van Pelt, Yi & James LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Techniques for optimized web domains classification based on progressive crawling with clustering are disclosed. In some embodiments, optimized web domains classification based on progressive crawling with clustering includes crawling a domain (e.g., a web site domain) to collect data for a subset of pages (e.g., web pages) of a corpus of content associated with the domain; classifying each of the crawled pages into one or more category clusters, in which the category clusters represent a content categorization of the corpus of content associated with the domain (e.g., a URL content categorization for the domain, host of that domain, and/or directory of that domain); and determining which of the one or more category clusters to publish for the domain.
Information query