System and method for improving chatbot training dataset

Invention Grant

US11893354B2 System and method for improving chatbot training dataset 有权

Please log in to see more content

Patent Title: System and method for improving chatbot training dataset
Application No.: US17347773

Application Date: 2021-06-15
Publication No.: US11893354B2

Publication Date: 2024-02-06
Inventor: Jithu R Jacob , Siddhartha Das
Applicant: Cognizant Technology Solutions India Pvt. Ltd.
Applicant Address: IN Chennai
Assignee: COGNIZANT TECHNOLOGY SOLUTIONS INDIA PVT. LTD.
Current Assignee: COGNIZANT TECHNOLOGY SOLUTIONS INDIA PVT. LTD.
Current Assignee Address: IN Chennai
Agency: CANTOR COLBURN LLP
Priority: IN 2141013199 2021.03.25
Main IPC: G06F40/30
IPC: G06F40/30 ; G06F40/232 ; G06F40/117 ; G10L25/30

System and method for improving chatbot training dataset

Abstract:

The present invention provides for improving training dataset by identifying errors in training dataset and generating improvement recommendations. In operation, the present invention provides for identifying and correcting duplicate utterances in training dataset comprising utterances-intent pairs. Further, a plurality of Natural Language ML models are trained with the corrected training dataset to obtain diverse set of trained ML models. Each utterance of training dataset are fed as input to trained ML models, and a probability of error associated with each utterances-intent pairs of training dataset are evaluated based on analysis of respective intent predictions received from each of the trained ML models. Furthermore, spelling errors in the dataset are identified and data-imbalances in the training dataset are evaluated. Finally, a set of improvement recommendations for each utterances-intent pair is generated based on evaluated probability of errors, spelling errors, duplicate utterances and data imbalances.

Public/Granted literature

US20220309247A1 SYSTEM AND METHOD FOR IMPROVING CHATBOT TRAINING DATASET Public/Granted day:2022-09-29

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/30	.语义分析