Invention Grant
- Patent Title: Training set construction for taxonomic classification
- Patent Title (中): 分类分类培训班
-
Application No.: US12604025Application Date: 2009-10-22
-
Publication No.: US08122005B1Publication Date: 2012-02-21
- Inventor: Philo Juang , Christopher Testa , Nicolaus Mote
- Applicant: Philo Juang , Christopher Testa , Nicolaus Mote
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Brake Hughes Bellermann LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.
Information query