Invention Grant
- Patent Title: Encoding entity representations for cross-document coreference
-
Application No.: US16848144Application Date: 2020-04-14
-
Publication No.: US11573994B2Publication Date: 2023-02-07
- Inventor: Michael Robert Glass , Nicholas Brady Garvan Monath , Robert G. Farrell , Alfio Massimiliano Gliozzo , Gaetano Rossiello
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Agent Stosch Sabo
- Main IPC: G06F16/30
- IPC: G06F16/30 ; G06F16/35 ; G06N3/08 ; G06F40/40 ; G06F40/284 ; G06F40/30 ; G06F40/216

Abstract:
A computer-implemented method for performing cross-document coreference for a corpus of input documents includes determining mentions by parsing the input documents. Each mention includes a first vector for spelling data and a second vector for context data. A hierarchical tree data structure is created by generating several leaf nodes corresponding to respective mentions. Further, for each node, a similarity score is computed based on the first and second vectors of each node. The hierarchical tree is populated iteratively until a root node is created. Each iteration includes merging two nodes that have the highest similarity scores and creating an entity node instead at a hierarchical level that is above the two nodes being merged. Further, each iteration includes computing the similarity score for the entity node. The nodes with the similarity scores above a predetermined value are entities for which coreference has been performed in input documents.
Public/Granted literature
- US20210319054A1 ENCODING ENTITY REPRESENTATIONS FOR CROSS-DOCUMENT COREFERENCE Public/Granted day:2021-10-14
Information query