SIMULTANEOUS DATA SAMPLING AND FEATURE SELECTION VIA WEAK LEARNERS

    公开(公告)号:US20250013909A1

    公开(公告)日:2025-01-09

    申请号:US18218970

    申请日:2023-07-06

    Abstract: From many features and many multidimensional points, a computer generates exploratory training configurations. Each point contains a value for each of the features. Each exploratory training configuration identifies a random subset of the features and a random subset of the points. A performance score is generated for each of the exploratory training configurations. A feature weight is generated for each of the features that is based on the performance scores of the exploratory training configurations whose random subset of features contains the feature. A point weight is generated for each of the points that is based on the performance scores of the exploratory training configurations whose random subset of the many points contains the point. A machine learning model is trained using an optimized training corpus that consists of a subset of the many features based on feature weight and a subset of the many points based on point weight.

    ACCELERATING AUTOMATED ALGORITHM CONFIGURATION USING HISTORICAL PERFORMANCE DATA

    公开(公告)号:US20240394557A1

    公开(公告)日:2024-11-28

    申请号:US18202472

    申请日:2023-05-26

    Abstract: In an embodiment, a computer combines first original hyperparameters and second original hyperparameters into combined hyperparameters. In each iteration of a binary search that selects hyperparameters, these are selected: a) important hyperparameters from the combined hyperparameters and b) based on an estimated complexity decrease by including only important hyperparameters as compared to the combined hyperparameters, which only one boundary of the binary search to adjust. For the important hyperparameters of a last iteration of the binary search that selects hyperparameters, a pruned value range of a particular hyperparameter is generated based on a first original value range of the particular hyperparameter for the first original hyperparameters and a second original value range of the same particular hyperparameter for the second original hyperparameters. To accelerate hyperparameter optimization (HPO), the particular hyperparameter is tuned only within the pruned value range to discover an optimal value for configuring and training a machine learning model.

    FAST AND ACCURATE ANOMALY DETECTION EXPLANATIONS WITH FORWARD-BACKWARD FEATURE IMPORTANCE

    公开(公告)号:US20230376366A1

    公开(公告)日:2023-11-23

    申请号:US17992743

    申请日:2022-11-22

    CPC classification number: G06F11/006 G06N20/00 G06F2201/82

    Abstract: The present invention relates to machine learning (ML) explainability (MLX). Herein are local explanation techniques for black box ML models based on coalitions of features in a dataset. In an embodiment, a computer receives a request to generate a local explanation of which coalitions of features caused an anomaly detector to detect an anomaly. During unsupervised generation of a new coalition, a first feature is randomly selected from features in a dataset. Which additional features in the dataset can join the coalition, because they have mutual information with the first feature that exceeds a threshold, is detected. For each feature that is not in the coalition, values of the feature are permuted in imperfect copies of original tuples in the dataset. An average anomaly score of the imperfect copies is measured. Based on the average anomaly score of the imperfect copies, a local explanation is generated that references (e.g. defines) the coalition.

    THRESHOLD TUNING FOR IMBALANCED MULTI-CLASS CLASSIFICATION MODELS

    公开(公告)号:US20240303541A1

    公开(公告)日:2024-09-12

    申请号:US18386196

    申请日:2023-11-01

    CPC classification number: G06N20/00 G06N7/01

    Abstract: In an embodiment, a computer generates, from an input, an inference that contains multiple probabilities respectively for multiple mutually exclusive classes that contain a first class and a second class. The probabilities contain (e.g. due to overfitting) a higher probability for the first class that is higher than a lower probability for the second class. In response to a threshold exceeding the higher probability, the input is automatically and more accurately classified as the second class. One, some, or almost all classes may have a respective distinct threshold that can be concurrently applied for acceleration. Data parallelism may simultaneously apply a threshold to a batch of multiple inputs for acceleration.

    Fast and accurate anomaly detection explanations with forward-backward feature importance

    公开(公告)号:US11966275B2

    公开(公告)日:2024-04-23

    申请号:US17992743

    申请日:2022-11-22

    CPC classification number: G06F11/006 G06N20/00 G06F2201/82

    Abstract: The present invention relates to machine learning (ML) explainability (MLX). Herein are local explanation techniques for black box ML models based on coalitions of features in a dataset. In an embodiment, a computer receives a request to generate a local explanation of which coalitions of features caused an anomaly detector to detect an anomaly. During unsupervised generation of a new coalition, a first feature is randomly selected from features in a dataset. Which additional features in the dataset can join the coalition, because they have mutual information with the first feature that exceeds a threshold, is detected. For each feature that is not in the coalition, values of the feature are permuted in imperfect copies of original tuples in the dataset. An average anomaly score of the imperfect copies is measured. Based on the average anomaly score of the imperfect copies, a local explanation is generated that references (e.g. defines) the coalition.

Patent Agency Ranking