Encoding entity representations for cross-document coreference

Invention Grant

US11573994B2 Encoding entity representations for cross-document coreference 有权

Please log in to see more content

Patent Title: Encoding entity representations for cross-document coreference
Application No.: US16848144

Application Date: 2020-04-14
Publication No.: US11573994B2

Publication Date: 2023-02-07
Inventor: Michael Robert Glass , Nicholas Brady Garvan Monath , Robert G. Farrell , Alfio Massimiliano Gliozzo , Gaetano Rossiello
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agency: Cantor Colburn LLP
Agent Stosch Sabo
Main IPC: G06F16/30
IPC: G06F16/30 ; G06F16/35 ; G06N3/08 ; G06F40/40 ; G06F40/284 ; G06F40/30 ; G06F40/216

Encoding entity representations for cross-document coreference

Abstract:

A computer-implemented method for performing cross-document coreference for a corpus of input documents includes determining mentions by parsing the input documents. Each mention includes a first vector for spelling data and a second vector for context data. A hierarchical tree data structure is created by generating several leaf nodes corresponding to respective mentions. Further, for each node, a similarity score is computed based on the first and second vectors of each node. The hierarchical tree is populated iteratively until a root node is created. Each iteration includes merging two nodes that have the highest similarity scores and creating an entity node instead at a hierarchical level that is above the two nodes being merged. Further, each iteration includes computing the similarity score for the entity node. The nodes with the similarity scores above a predetermined value are entities for which coreference has been performed in input documents.

Public/Granted literature

US20210319054A1 ENCODING ENTITY REPRESENTATIONS FOR CROSS-DOCUMENT COREFERENCE Public/Granted day:2021-10-14

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）