Computer-implemented method of preparing a training dataset for a natural language processing or natural language understanding machine learning algorithm

Invention Grant

US12169685B2 Computer-implemented method of preparing a training dataset for a natural language processing or natural language understanding machine learning algorithm 有权

Please log in to see more content

Patent Title: Computer-implemented method of preparing a training dataset for a natural language processing or natural language understanding machine learning algorithm
Application No.: US17450795

Application Date: 2021-10-13
Publication No.: US12169685B2

Publication Date: 2024-12-17
Inventor: Simon Hegelich , Kolja Hegelich
Applicant: Simon Hegelich , Kolja Hegelich
Applicant Address: DE Munich; DE Dorsten
Assignee: Simon Hegelich,Kolja Hegelich
Current Assignee: Simon Hegelich,Kolja Hegelich
Current Assignee Address: DE Munich; DE Dorsten
Agency: Schwegman Lundberg & Woessner, P.A.
Priority: EP21172292 20210505
Main IPC: G06F40/166
IPC: G06F40/166 ; G06F40/247 ; G06F40/253 ; G06N5/022

Computer-implemented method of preparing a training dataset for a natural language processing or natural language understanding machine learning algorithm

Abstract:

Described and claimed is a computer-implemented method of preparing a training dataset for a natural language processing, NLP, or natural language understanding, NLU, machine learning algorithm from an original text dataset, the method comprising the steps of selecting one or more sentences from the original text dataset as selected sentences, determining for each selected sentence one or more grammatical elements of the selected sentence that can be negated as negatable elements, determining for one or more negatable words in each negatable element one or more antonyms, based on each determined antonym creating a negated sentence by replacing the respective negatable element in the selected sentence for which the negatable element was determined with the determined antonym, and adding the negated sentences to the training dataset. Further, a computer-implemented method of training a word embedding or an NLP or NLU machine learning algorithm, a system and a computer program product are described and claimed.

Public/Granted literature

US20220358282A1 COMPUTER-IMPLEMENTED METHOD OF PREPARING A TRAINING DATASET FOR A NATURAL LANGUAGE PROCESSING OR NATURAL LANGUAGE UNDERSTANDING MACHINE LEARNING ALGORITHM Public/Granted day:2022-11-10

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/10	.文本处理（自然语言分析G06F 40/20;语义分析G06F 40/30;自然语言处理或翻译G06F 40/40）
G06F40/166	..编辑，例如插入或删除