Preparing documents for coreference analysis

Invention Grant

US11556574B2 Preparing documents for coreference analysis 有权

Please log in to see more content

Patent Title: Preparing documents for coreference analysis
Application No.: US17139147

Application Date: 2020-12-31
Publication No.: US11556574B2

Publication Date: 2023-01-17
Inventor: Anton Yegorin
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agent Brian D. Welle
Main IPC: G06F16/33
IPC: G06F16/33 ; G06F16/332 ; G06F40/247

Preparing documents for coreference analysis

Abstract:

Unstructured text is identified as larger than a threshold size. Named-entity recognition analysis is executed on the unstructured text. One or more anchor entities of the unstructured text are determined that each occur more than a threshold amount of times within the unstructured text. Two or more instances of the one or more anchor entities that are separated by at least a threshold amount of text of the unstructured text are identified. The unstructured text is partitioned into at least three sections. The unstructured text is partitioned at respective natural language demarcation points associated with each of the two or more instances such that each of the at least three sections is smaller than the threshold size. Separate coreference analyses are performed in parallel on each of the at least three sections.

Public/Granted literature

US20220207065A1 PREPARING DOCUMENTS FOR COREFERENCE ANALYSIS Public/Granted day:2022-06-30

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/33	..••查询