Invention Grant
- Patent Title: Automatic genre classification determination of web content to which the web content belongs together with a corresponding genre probability
- Patent Title (中): 自动流派分类确定网页内容所属的网页内容与相应的类型概率
-
Application No.: US14096481Application Date: 2013-12-04
-
Publication No.: US09565236B2Publication Date: 2017-02-07
- Inventor: Dirk Harz , Ralf Iffert , Mark Keinhoerster , Mark Usher
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Francis Lammes; Stephen J. Walder, Jr.; Gail H. Zarick
- Priority: GB1300685.3 20130115
- Main IPC: H04L29/08
- IPC: H04L29/08 ; G06N99/00

Abstract:
A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.
Public/Granted literature
- US20140201113A1 Automatic Genre Determination of Web Content Public/Granted day:2014-07-17
Information query