Patent search ap:("Oracle International Corporation") AND inv:"Duy Vu" Page 1

1.

发明授权
Techniques for out-of-domain (OOD) detection 有权

公开(公告)号：US12014146B2

公开(公告)日：2024-06-18

申请号：US18364298

申请日：2023-08-02

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/205 , G06F40/289 , G06N20/00 , H04L51/02

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

2.

发明授权
Techniques for out-of-domain (OOD) detection 有权

公开(公告)号：US11763092B2

公开(公告)日：2023-09-19

申请号：US17217909

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06N20/00 , G06F40/289 , H04L51/02 , G06F40/205

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

3.

发明授权
Techniques for out-of-domain (OOD) detection 有权

公开(公告)号：US12299402B2

公开(公告)日：2025-05-13

申请号：US18659606

申请日：2024-05-09

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/205 , G06F40/289 , G06N20/00 , H04L51/02

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

4.

发明授权
Context tag integration with named entity recognition models 有权

公开(公告)号：US11868727B2

公开(公告)日：2024-01-09

申请号：US17648376

申请日：2022-01-19

Applicant: Oracle International Corporation

Inventor： Duy Vu , Tuyen Quang Pham , Cong Duy Vu Hoang , Srinivasa Phani Kumar Gadde , Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi

IPC: G06F40/295 , G06F40/205 , G06V30/19 , G06F40/40 , G06F40/35 , G06F40/279

CPC classification number: G06F40/295 , G06F40/205 , G06F40/279 , G06F40/35 , G06F40/40 , G06V30/19147

Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.

5.

发明公开
DATA AUGMENTATION AND BATCH BALANCING METHODS TO ENHANCE NEGATION AND FAIRNESS 审中-公开

公开(公告)号：US20230153688A1

公开(公告)日：2023-05-18

申请号：US17984768

申请日：2022-11-10

Applicant: Oracle International Corporation

Inventor： Duy Vu , Varsha Kuppur Rajendra , Dai Hoang Tran , Shivashankar Subramanian , Poorya Zaremoodi , Thanh Long Duong , Mark Edward Johnson

IPC: G06N20/00 , G06F40/20 , G06F40/49

CPC classification number: G06N20/00 , G06F40/20 , G06F40/49

Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, searching the training set of labeled examples or an unlabeled corpus of text on target domains for sentiment examples having negation cues, sentiment laden words, words with sentiment prefixes or suffixes, or a combination thereof, rewriting the sentiment examples to create negated versions thereof and generate a labeled negation pair data set, and training the machine learning model using labeled examples from the labeled negation pair data set.

6.

发明申请
FRAMEWORK FOR FOCUSED TRAINING OF LANGUAGE MODELS AND TECHNIQUES FOR END-TO-END HYPERTUNING OF THE FRAMEWORK 有权

公开(公告)号：US20230098783A1

公开(公告)日：2023-03-30

申请号：US17952116

申请日：2022-09-23

Applicant: Oracle International Corporation

Inventor： Poorya Zaremoodi , Cong Duy Vu Hoang , Duy Vu , Dai Hoang Tran , Budhaditya Saha , Nagaraj N. Bhat , Thanh Tien Vu , Tuyen Quang Pham , Adam Craig Pocock , Katherine Silverstein , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong

IPC: G10L15/06 , G10L15/183

Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

7.

发明授权
Framework for focused training of language models and techniques for end-to-end hypertuning of the framework 有权

公开(公告)号：US12288550B2

公开(公告)日：2025-04-29

申请号：US17952116

申请日：2022-09-23

Applicant: Oracle International Corporation

Inventor： Poorya Zaremoodi , Cong Duy Vu Hoang , Duy Vu , Dai Hoang Tran , Budhaditya Saha , Nagaraj N. Bhat , Thanh Tien Vu , Tuyen Quang Pham , Adam Craig Pocock , Katherine Silverstein , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong

IPC: G10L15/06 , G10L15/183

Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

8.

发明公开
DATA AUGMENTATION AND BATCH BALANCING FOR TRAINING MULTI-LINGUAL MODEL 审中-公开

公开(公告)号：US20240135116A1

公开(公告)日：2024-04-25

申请号：US18485779

申请日：2023-10-12

Applicant: Oracle International Corporation

Inventor： Duy Vu , Poorya Zaremoodi , Nagaraj N. Bhat , Srijon Sarkar , Varsha Kuppur Rajendra , Thanh Long Duong , Mark Edward Johnson , Pramir Sarkar , Shahid Reza

IPC: G06F40/58 , G06F40/20

CPC classification number: G06F40/58 , G06F40/20

Abstract: A computer-implemented method includes: accessing a plurality of datasets, where each dataset of the plurality of datasets includes training examples; selecting datasets that include the training examples in a source language and a target language; and sampling, based on a sampling weight that is determined for each of the selected datasets, the training examples from the selected datasets to generate the training batches; training an ML model for performing at least a first task using the training examples of the training batches, by interleavingly inputting the training batches to the ML model; and outputting the trained ML model configured to perform the at least the first task on input utterances provided in at least one among the source language and the target language. The sampling weight is determined for each of the selected datasets based on one or more attributes common to the training examples of the selected dataset.

9.

发明公开
DATA AUGMENTATION AND BATCH BALANCING METHODS TO ENHANCE NEGATION AND FAIRNESS 审中-公开

公开(公告)号：US20230153528A1

公开(公告)日：2023-05-18

申请号：US17984743

申请日：2022-11-10

Applicant: Oracle International Corporation

Inventor： Duy Vu , Varsha Kuppur Rajendra , Dai Hoang Tran , Shivashankar Subramanian , Poorya Zaremoodi , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/279 , G06F40/166 , G06N5/02

CPC classification number: G06F40/279 , G06F40/166 , G06N5/022

Abstract: Techniques for augmentation and batch balancing of training data to enhance negation and fairness of a machine learning model. In one particular aspect, a method is provided that includes generating a list of demographic words associated with a demographic group, searching an unlabeled corpus of text to identify unlabeled examples in a target domain comprising at least one demographic word from the list of demographic words, rewriting the unlabeled examples to create one or more versions of each of the unlabeled examples and generate a fairness invariance data set, and training the machine learning model using unlabeled examples from the fairness invariance data set.

10.

发明公开
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 审中-公开

公开(公告)号：US20240289555A1

公开(公告)日：2024-08-29

申请号：US18659606

申请日：2024-05-09

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/205 , G06F40/289 , G06N20/00 , H04L51/02

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification