Method and apparatus for extracting information

Invention Grant

US10679051B2 Method and apparatus for extracting information 有权

Please log in to see more content

Patent Title: Method and apparatus for extracting information
Application No.: US15564187

Application Date: 2016-06-17
Publication No.: US10679051B2

Publication Date: 2020-06-09
Inventor: Shouke Qin , You Han , Zhiyang Chen , Feichao Ma , Peizhi Xu
Applicant: Baidu Online Network Technology (Beijing) Co., Ltd.
Applicant Address: CN Beijing
Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
Current Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
Current Assignee Address: CN Beijing
Agency: Knobbe, Martens, Olson & Bear, LLP
Priority: com.zzzhc.datahub.patent.etl.us.BibliographicData$PriorityClaim@60d1bd7f
International Application: PCT/CN2016/086213 WO 20160617
International Announcement: WO2017/113645 WO 20170706
Main IPC: G06K9/00
IPC: G06K9/00 ; G06F16/35 ; G06F16/958 ; G06F40/14 ; G06F40/117 ; G06F40/154

Abstract:

The present application discloses a method and apparatus for extracting information. A specific implementation of the method comprises: parsing a pre-acquired web page file into a structure of a tag tree, and recognizing, in nodes of the tag tree, at least one body node at which a web page body in the web page file is located; performing a paragraph division on a content contained in the at least one body node to generate paragraph blocks, and setting a tag attribute for each paragraph block according to an attribute of a tag associated with the each paragraph block; classifying a text content contained in the each paragraph block based on the tag attribute of the each paragraph block; and extracting information comprising a question and an answer from the text content contained in the each paragraph block based on a classification result. This implementation implements the automatic and precise extraction of information.

Public/Granted literature

US20180322341A1 METHOD AND APPARATUS FOR EXTRACTING INFORMATION Public/Granted day:2018-11-08

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )