Invention Grant
- Patent Title: Automatically extracting by-line information
- Patent Title (中): 自动提取离线信息
-
Application No.: US12192917Application Date: 2008-08-15
-
Publication No.: US08321396B2Publication Date: 2012-11-27
- Inventor: Stephen Dill , Madhukar R. Korupolu , Andrew S. Tomkins
- Applicant: Stephen Dill , Madhukar R. Korupolu , Andrew S. Tomkins
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Shimokaji & Assoc., PC
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system constructs the set of potential headlines based on the title meta-tag. The system selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The system extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
Public/Granted literature
- US20080306941A1 SYSTEM FOR AUTOMATICALLY EXTRACTING BY-LINE INFORMATION Public/Granted day:2008-12-11
Information query