Invention Grant
- Patent Title: Method and device for extracting web information
- Patent Title (中): 用于提取Web信息的方法和设备
-
Application No.: US12338484Application Date: 2008-12-18
-
Publication No.: US08196037B2Publication Date: 2012-06-05
- Inventor: Kai Cheng
- Applicant: Kai Cheng
- Applicant Address: CN Shenzhen
- Assignee: Tencent Technology (Shenzhen) Company Limited
- Current Assignee: Tencent Technology (Shenzhen) Company Limited
- Current Assignee Address: CN Shenzhen
- Priority: CN200610086427 20060619
- Main IPC: G06F17/20
- IPC: G06F17/20

Abstract:
A method for extracting web information includes: selecting a number of Hypertext Markup Language, HTML, tags as tag ruler elements to generate a tag ruler from an HTML text of a web page according to sequence of the HTML text; matching the HTML text with the tag ruler elements in the tag ruler according to the sequence of the tag ruler elements in the tag ruler, segmenting web information according to matched HTML tags and saving web information segments and location information of HTML tags enclosing the web information segments in the HTML text; and determining location of HTML tags containing web information needed by a user in the HTML text, extracting web information segments corresponding to the web information needed by the user from the saved web information segments.
Public/Granted literature
- US20090100056A1 Method And Device For Extracting Web Information Public/Granted day:2009-04-16
Information query