Parse tree based vectorization for natural language processing

Invention Grant

US10922486B2 Parse tree based vectorization for natural language processing 有权

Please log in to see more content

Patent Title: Parse tree based vectorization for natural language processing
Application No.: US16352358

Application Date: 2019-03-13
Publication No.: US10922486B2

Publication Date: 2021-02-16
Inventor: Mudhakar Srivatsa , Raghu Kiran Ganti , Yeon-sup Lim , Shreeranjani Srirangamsridharan , Antara Palit
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agency: Garg Law Firm, PLLC
Agent Rakesh Garg; Joseph Petrokaitis
Main IPC: G06F40/279
IPC: G06F40/279 ; G06F40/211 ; G06F40/30 ; G06F40/284

Parse tree based vectorization for natural language processing

Abstract:

A parse tree corresponding to a portion of narrative text is constructed. The parse tree includes a data structure representing a syntactic structure of the portion of narrative text as a set of tokens according to a grammar. Using a token in the parse tree as a focus word, a context window comprising a set of words within a specified distance from the focus word is generated, the distance determined according to a number of links of the parse tree separating the focus word and a context word in the set of words. A weight is generated for the focus word and the context word. Using the weight, a first vector representation of a first word is generated, the first word being within a second portion of narrative text.

Public/Granted literature

US20200293614A1 PARSE TREE BASED VECTORIZATION FOR NATURAL LANGUAGE PROCESSING Public/Granted day:2020-09-17

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/20	.自然语言分析（自然语言的语义分析入G06F40/30）
G06F40/279	..文字实体的识别