Invention Grant
- Patent Title: Computer-implemented method of preparing a training dataset for a natural language processing or natural language understanding machine learning algorithm
-
Application No.: US17450795Application Date: 2021-10-13
-
Publication No.: US12169685B2Publication Date: 2024-12-17
- Inventor: Simon Hegelich , Kolja Hegelich
- Applicant: Simon Hegelich , Kolja Hegelich
- Applicant Address: DE Munich; DE Dorsten
- Assignee: Simon Hegelich,Kolja Hegelich
- Current Assignee: Simon Hegelich,Kolja Hegelich
- Current Assignee Address: DE Munich; DE Dorsten
- Agency: Schwegman Lundberg & Woessner, P.A.
- Priority: EP21172292 20210505
- Main IPC: G06F40/166
- IPC: G06F40/166 ; G06F40/247 ; G06F40/253 ; G06N5/022

Abstract:
Described and claimed is a computer-implemented method of preparing a training dataset for a natural language processing, NLP, or natural language understanding, NLU, machine learning algorithm from an original text dataset, the method comprising the steps of selecting one or more sentences from the original text dataset as selected sentences, determining for each selected sentence one or more grammatical elements of the selected sentence that can be negated as negatable elements, determining for one or more negatable words in each negatable element one or more antonyms, based on each determined antonym creating a negated sentence by replacing the respective negatable element in the selected sentence for which the negatable element was determined with the determined antonym, and adding the negated sentences to the training dataset. Further, a computer-implemented method of training a word embedding or an NLP or NLU machine learning algorithm, a system and a computer program product are described and claimed.
Public/Granted literature
Information query