构建人名语料识别模型的方法及装置

Invention Publication

Please log in to see more content

Patent Title: 构建人名语料识别模型的方法及装置
Patent Title (English): Method and device for building name corpus recognition module
Application No.: CN201510141915.9

Application Date: 2015-03-27
Publication No.: CN106156051A

Publication Date: 2016-11-23
Inventor: 周连强 , 王倩
Applicant: 深圳市腾讯计算机系统有限公司
Applicant Address: 广东省深圳市南山区高新区高新南一路飞亚达大厦5-10楼
Assignee: 深圳市腾讯计算机系统有限公司
Current Assignee: 深圳市腾讯计算机系统有限公司
Current Assignee Address: 广东省深圳市南山区高新区高新南一路飞亚达大厦5-10楼
Agency: 北京三高永信知识产权代理有限责任公司
Agent 祝亚男
Main IPC: G06F17/30
IPC: G06F17/30

Abstract:

本发明公开了一种构建人名语料识别模型的方法及装置，属于信息技术领域。方法包括：根据至少两种不同的人名语料训练模型，对每个建模中文语料进行标注；当根据标注结果对建模中文语料的预测结果的一致性达到预设指标时，将建模中文语料添加到语料训练列表中；提取语料训练列表中每个建模中文语料的语料特征；根据建模中文语料的语料特征，构建人名语料识别模型。本发明借助多种不同的人名语料训练模型，对每个建模中文语料进行标注，基于标注结果，构建人名语料识别模型。在该过程中，无需用户进行人工标注，降低了语料标注成本，且在构建人名语料训练模型时，综合了多种人名语料训练模型的标注结果，提高了所构建的人名语料识别模型的识别精度。

Abstract(English):

The invention discloses a method and device for building a name corpus recognition module and belongs to the field of information. The method comprises the steps that each modeling Chinese corpus is labeled for at least two different name corpus training modules; when consistency of prediction results of the modeling Chinese corpuses reaches a preset index according to the labeling result, the modeling Chinese corpuses are added into a corpus training list; the corpus features of each modeling Chinese corpus in the corpus training list are extracted; according to the corpus features of the modeling Chinese corpuses, a name corpus recognition model is built. By means of the multiple different name corpus training models, each Chinese corpus is labeled, and the name corpus recognition model is built based on the labeling results. In the process, manual labeling of the user is not needed, the corpus labeling cost is reduced, the labeling results of the name corpus training models are combined when the name corpus training model is built, and the recognition precision of the built name corpus recognition model is improved.

Public/Granted literature

CN106156051B 构建人名语料识别模型的方法及装置 Public/Granted day:2019-08-13

Information query

Chinese Patent Announcement Global Dossier Espacenet