Global, model-agnostic machine learning explanation technique for textual data

Invention Grant

US11720751B2 Global, model-agnostic machine learning explanation technique for textual data 有权

Please log in to see more content

Patent Title: Global, model-agnostic machine learning explanation technique for textual data
Application No.: US17146375

Application Date: 2021-01-11
Publication No.: US11720751B2

Publication Date: 2023-08-08
Inventor: Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal
Applicant: Oracle International Corporation
Applicant Address: US CA Redwood Shores
Assignee: Oracle International Corporation
Current Assignee: Oracle International Corporation
Current Assignee Address: US CA Redwood Shores
Agency: Hickman Becker Bingham Ledesma LLP
Main IPC: G06F17/00
IPC: G06F17/00 ; G06F40/284 ; G06F40/30 ; G06F40/166 ; G06N20/00

Global, model-agnostic machine learning explanation technique for textual data

Abstract:

A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, “NLP-MLX”, is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions. Finally, a Token Evaluation stage uses the ML model and perturbed documents to evaluate the impact of each candidate token relative to predictions for the original documents.

Public/Granted literature

US20220229983A1 GLOBAL, MODEL-AGNOSTIC MACHINE LEARNING EXPLANATION TECHNIQUE FOR TEXTUAL DATA Public/Granted day:2022-07-21

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）