Clustering-based data selection for optimization of risk predictive machine learning models

Invention Grant

US12141806B2 Clustering-based data selection for optimization of risk predictive machine learning models 有权

Please log in to see more content

Patent Title: Clustering-based data selection for optimization of risk predictive machine learning models
Application No.: US17334743

Application Date: 2021-05-30
Publication No.: US12141806B2

Publication Date: 2024-11-12
Inventor: Danny Butvinik , Maria Zatsepin , Yoav Avneon
Applicant: Actimize LTD.
Applicant Address: IL Ra'anana
Assignee: Actimize LTD.
Current Assignee: Actimize LTD.
Current Assignee Address: IL Ra'anana
Agency: SOROKER AGMON NORDMAN RIBA
Main IPC: G06N20/00
IPC: G06N20/00 ; G06N5/04 ; G06Q20/40 ; G06F18/214 ; G06F18/23 ; G06F18/24

Clustering-based data selection for optimization of risk predictive machine learning models

Abstract:

A risk-prediction-preparation module to generate a risk-prediction-model, is provided herein. The risk-prediction-preparation module includes accessing a data-storage of transactions to operate a group-by operation on transactions related to data-points, according to a logical-entity into entities. Then, clustering entities of a clean-financial dataset into clusters. Selecting data-points of: (a) entities from the clusters to a first dataset and (b) a preconfigured amount of entities randomly to a second dataset. Selecting all entities that have at least one ‘fraudulent’ data-points in at least one related data-point to add all the entities to the first dataset and the second dataset. Using vectorized and scaled extracted features for training a first machine-learning-model of fraud detection on the first dataset and training a second machine-learning-model of fraud detection on the second dataset to collect results. Using the results for combining the first machine-learning-model and the second machine-learning-model to an ensemble machine-learning-model for risk-prediction.

Public/Granted literature

US20220383322A1 CLUSTERING-BASED DATA SELECTION FOR OPTIMIZATION OF RISK PREDICTIVE MACHINE LEARNING MODELS Public/Granted day:2022-12-01

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N20/00	机器学习