-
公开(公告)号:US20230205839A1
公开(公告)日:2023-06-29
申请号:US18051906
申请日:2022-11-02
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Xin Jiang Hunt , Ralph Walter Abbey
Abstract: A computing device trains a fair machine learning model. A predicted target variable is defined using a trained prediction model. The prediction model is trained with weighted observation vectors. The predicted target variable is updated using the prediction model trained with weighted observation vectors. A true conditional moments matrix and a false conditional moments matrix are computed. The training and updating with weighted observation vectors are repeated until a number of iterations is performed. When a computed conditional moments matrix indicates to adjust a bound value, the bound value is updated based on an upper bound value or a lower bound value, and the repeated training and updating with weighted observation vectors is repeated with the bound value replaced with the updated bound value until the conditional moments matrix indicates no further adjustment of the bound value is needed. A fair prediction model is trained with the updated bound value.
-
公开(公告)号:US11200514B1
公开(公告)日:2021-12-14
申请号:US17342825
申请日:2021-06-09
Applicant: SAS Institute Inc.
IPC: G06N20/00
Abstract: Unclassified observations are classified. Similarity values are computed for each unclassified observation and for each target variable value. A confidence value is computed for each unclassified observation using the similarity values. A high-confidence threshold value and a low-confidence threshold value are computed from the confidence values. For each observation, when the confidence value is greater than the high-confidence threshold value, the observation is added to a training dataset and, when the confidence value is greater than the low-confidence threshold value and less than the high-confidence threshold value, the observation is added to the training dataset based on a comparison between a random value drawn from a uniform distribution and an inclusion percentage value. A classification model is trained with the training dataset and classified observations. The trained classification model is executed with the unclassified observations to determine a label assignment.
-
公开(公告)号:US11195084B1
公开(公告)日:2021-12-07
申请号:US17198737
申请日:2021-03-11
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Yingjian Wang , Xiangqian Hu
Abstract: A computing device trains a neural network machine learning model. A forward propagation of a first neural network is executed. A backward propagation of the first neural network is executed from a last layer to a last convolution layer of a plurality of convolutional layers to compute a gradient vector for first weight values of the last convolution layer using observation vectors. A discriminative localization map is computed for each observation vector with the gradient vector using a discriminative localization map function. A forward and a backward propagation of a second neural network is executed to compute a second weight value for each neuron of the second neural network using the discriminative localization map computed for each observation vector. A predefined number of iterations of the forward and the backward propagation of the second neural network is repeated.
-
公开(公告)号:US10417528B2
公开(公告)日:2019-09-17
申请号:US16059241
申请日:2018-08-09
Applicant: SAS Institute Inc.
Inventor: Yongjin Ma , Xinmin Wu , Xiaomei Liu
Abstract: An assessment dataset is selected from an input dataset using a first stratified sampling process based on a value of an event assessment variable. A remainder of the input dataset is allocated to a training/validation dataset that is partitioned into an oversampled training/validation dataset using an oversampling process based on a predefined value of the event assessment variable. A validation sample is selected from the oversampled training/validation dataset using a second stratified sampling process based on the value of the event assessment variable. A training sample is selected from the oversampled training/validation dataset using the second stratified sampling process based on the value of the event assessment variable. The validation sample and the training sample are mutually exclusive. A predictive type model is trained using the selected training sample. A plurality of predictive type models are trained, validated, and scored using the samples to select a best predictive model.
-
公开(公告)号:US20190258697A1
公开(公告)日:2019-08-22
申请号:US16398690
申请日:2019-04-30
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Tao Wang , Scott Russell Pope
IPC: G06F17/18
Abstract: A computing device computes a quantile value for a variable value extracted from an event block object by computing a bin number for the variable value. If the computed bin number is between a before bin number and an after bin number computed for a quantile, the quantile is identified. Frequency data is updated to include the extracted variable value as a key value. A frequency value associated with the key value indicates a number of occurrences of the variable value in previously processed data. A cumulative rank value of the identified quantile is updated. A quantile adjustment value is computed based on a comparison between the variable value and a current quantile value of the identified quantile. An updated quantile value associated with the identified quantile is computed using the updated frequency data, the computed quantile adjustment value, and the updated cumulative rank value of the identified quantile.
-
16.
公开(公告)号:US12093826B2
公开(公告)日:2024-09-17
申请号:US18444906
申请日:2024-02-19
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Ricky Dee Tharrington, Jr. , Ralph Walter Abbey , Xin Jiang Hunt
Abstract: A computing device trains a fair prediction model while defining an optimal event cutoff value. (A) A prediction model is trained with observation vectors. (B) The prediction model is executed to define a predicted target variable value and a probability associated with an accuracy of the predicted target variable value. (C) A conditional moments matrix is computed based on fairness constraints, the predicted target variable value, and the sensitive attribute variable value of each observation vector. The predicted target variable value has a predefined target event value only when the probability is greater than a predefined event cutoff value. (D) (A) through (C) are repeated. (E) An updated value is computed for the predefined event cutoff value. (F) (A) through (E) are repeated. An optimal event cutoff value is defined from the predefined event cutoff values used when repeating (A) through (E). The optimal value and prediction model are output.
-
17.
公开(公告)号:US10832087B1
公开(公告)日:2020-11-10
申请号:US16921417
申请日:2020-07-06
Applicant: SAS Institute Inc.
Inventor: Yingjian Wang , Xinmin Wu
Abstract: Machine-learning models (MLM) can be configured more rapidly and accurately according to some examples. For example, a system can receive a first training dataset that includes (i) independent-variable values corresponding to independent variables and (ii) dependent-variable values corresponding to a dependent variable that is influenced by the independent variables. The independent-variable values can include nonlinear-variable values corresponding to at least one nonlinear independent variable. The system can then determine cluster assignments for the nonlinear-variable values, generate a second training dataset based on the cluster assignments, and train a model based on the second training dataset. The trained machine-learning model may then be used in various applications, such as control-system applications.
-
公开(公告)号:US20190258904A1
公开(公告)日:2019-08-22
申请号:US16059241
申请日:2018-08-09
Applicant: SAS Institute Inc.
Inventor: Yongjin Ma , Xinmin Wu , Xiaomei Liu
Abstract: An assessment dataset is selected from an input dataset using a first stratified sampling process based on a value of an event assessment variable. A remainder of the input dataset is allocated to a training/validation dataset that is partitioned into an oversampled training/validation dataset using an oversampling process based on a predefined value of the event assessment variable. A validation sample is selected from the oversampled training/validation dataset using a second stratified sampling process based on the value of the event assessment variable. A training sample is selected from the oversampled training/validation dataset using the second stratified sampling process based on the value of the event assessment variable. The validation sample and the training sample are mutually exclusive. A predictive type model is trained using the selected training sample. A plurality of predictive type models are trained, validated, and scored using the samples to select a best predictive model.
-
公开(公告)号:US10311128B2
公开(公告)日:2019-06-04
申请号:US16140931
申请日:2018-09-25
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Xiangqian Hu , Tao Wang , Xunlei Wu
IPC: G06F17/18
Abstract: A computing device computes a quantile value. A maximum value and a minimum value are computed for unsorted variable values to compute an upper bin value and a lower bin value for each bin of a plurality of bins. A frequency counter is computed for each bin by reading the unsorted variable values a second time. A bin number and a cumulative rank value are computed for a quantile. When an estimated memory usage value exceeds a predefined memory size constraint value, a subset of the plurality of bins are split into a plurality of bins, the frequency counter is recomputed for each bin, and the bin number and the cumulative rank value are recomputed. Frequency data is computed using the frequency counters. The quantile value is computed using the frequency data and the cumulative rank value for the quantile and output.
-
公开(公告)号:US10127192B1
公开(公告)日:2018-11-13
申请号:US15961373
申请日:2018-04-24
Applicant: SAS Institute Inc.
Inventor: Xiangqian Hu , Xinmin Wu , Tao Wang , Xunlei Wu
Abstract: A computing device computes a quantile value. A maximum value and a minimum value are computed for unsorted variable values. An upper bin value and a lower bin value are computed for each bin of a plurality of bins using the maximum and minimum values. A frequency counter is computed for each bin by reading the unsorted variable values a second time. Each frequency counter is a count of the variable values within a respective bin. A bin number and a cumulative rank value are computed for a quantile. The bin number identifies a specific within which a quantile value associated with the quantile is located. The cumulative rank value identifies a cumulative rank for the quantile value associated with the quantile. Frequency data is computed using the frequency counters. The quantile value is computed using the frequency data and the cumulative rank value for the quantile and output.
-
-
-
-
-
-
-
-
-