-
1.
公开(公告)号:US20190129919A1
公开(公告)日:2019-05-02
申请号:US16140931
申请日:2018-09-25
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Xiangqian Hu , Tao Wang , Xunlei Wu
IPC: G06F17/18
Abstract: A computing device computes a quantile value. A maximum value and a minimum value are computed for unsorted variable values to compute an upper bin value and a lower bin value for each bin of a plurality of bins. A frequency counter is computed for each bin by reading the unsorted variable values a second time. A bin number and a cumulative rank value are computed for a quantile. When an estimated memory usage value exceeds a predefined memory size constraint value, a subset of the plurality of bins are split into a plurality of bins, the frequency counter is recomputed for each bin, and the bin number and the cumulative rank value are recomputed. Frequency data is computed using the frequency counters. The quantile value is computed using the frequency data and the cumulative rank value for the quantile and output.
-
公开(公告)号:US10311128B2
公开(公告)日:2019-06-04
申请号:US16140931
申请日:2018-09-25
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Xiangqian Hu , Tao Wang , Xunlei Wu
IPC: G06F17/18
Abstract: A computing device computes a quantile value. A maximum value and a minimum value are computed for unsorted variable values to compute an upper bin value and a lower bin value for each bin of a plurality of bins. A frequency counter is computed for each bin by reading the unsorted variable values a second time. A bin number and a cumulative rank value are computed for a quantile. When an estimated memory usage value exceeds a predefined memory size constraint value, a subset of the plurality of bins are split into a plurality of bins, the frequency counter is recomputed for each bin, and the bin number and the cumulative rank value are recomputed. Frequency data is computed using the frequency counters. The quantile value is computed using the frequency data and the cumulative rank value for the quantile and output.
-
公开(公告)号:US10127192B1
公开(公告)日:2018-11-13
申请号:US15961373
申请日:2018-04-24
Applicant: SAS Institute Inc.
Inventor: Xiangqian Hu , Xinmin Wu , Tao Wang , Xunlei Wu
Abstract: A computing device computes a quantile value. A maximum value and a minimum value are computed for unsorted variable values. An upper bin value and a lower bin value are computed for each bin of a plurality of bins using the maximum and minimum values. A frequency counter is computed for each bin by reading the unsorted variable values a second time. Each frequency counter is a count of the variable values within a respective bin. A bin number and a cumulative rank value are computed for a quantile. The bin number identifies a specific within which a quantile value associated with the quantile is located. The cumulative rank value identifies a cumulative rank for the quantile value associated with the quantile. Frequency data is computed using the frequency counters. The quantile value is computed using the frequency data and the cumulative rank value for the quantile and output.
-
公开(公告)号:US20190258697A1
公开(公告)日:2019-08-22
申请号:US16398690
申请日:2019-04-30
Applicant: SAS Institute Inc.
Inventor: Xinmin Wu , Tao Wang , Scott Russell Pope
IPC: G06F17/18
Abstract: A computing device computes a quantile value for a variable value extracted from an event block object by computing a bin number for the variable value. If the computed bin number is between a before bin number and an after bin number computed for a quantile, the quantile is identified. Frequency data is updated to include the extracted variable value as a key value. A frequency value associated with the key value indicates a number of occurrences of the variable value in previously processed data. A cumulative rank value of the identified quantile is updated. A quantile adjustment value is computed based on a comparison between the variable value and a current quantile value of the identified quantile. An updated quantile value associated with the identified quantile is computed using the updated frequency data, the computed quantile adjustment value, and the updated cumulative rank value of the identified quantile.
-
公开(公告)号:US20180053071A1
公开(公告)日:2018-02-22
申请号:US15686863
申请日:2017-08-25
Applicant: SAS Institute Inc.
CPC classification number: G06K9/6264 , G06F9/48 , G06K9/6259 , G06K9/627 , G06K9/66 , G06N3/0454 , G06N5/003 , G06N99/005
Abstract: A computing device predicts occurrence of an event or classifies an object using distributed unlabeled data. Supervised data that includes a labeled subset of a plurality of observation vectors is identified. A total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors is determined. The identified supervised data is uploaded to each thread of the total number of threads. Unlabeled observation vectors are randomly select from the unlabeled subset of the plurality of observation vectors to allocate to each thread of the total number of threads. The randomly selected, unlabeled observation vectors are uploaded to each thread of the total number of threads based on the allocation. The value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors is determined based on a converged classification matrix and output to a labeled dataset.
-
公开(公告)号:US09792562B1
公开(公告)日:2017-10-17
申请号:US15335530
申请日:2016-10-27
Applicant: SAS Institute Inc.
CPC classification number: G06N99/005 , G06N5/003 , G06N7/005
Abstract: A computing device predicts occurrence of an event or classifies an object using semi-supervised data. A label set defines permissible values for a target variable. A value of the permissible values is defined for a subset of observation vectors. A predefined number of times, a distance matrix is computed that defines a distance value between pairs of observation vectors using a distance function and a converged classification matrix; a number of observation vectors is selected that have minimum values for the distance value; a label is requested and a response is received for each of the selected observation vectors; the value of the target variable is updated for each of the selected observation vectors with the received response; and the value of the target variable is determined again by recomputing the converged classification matrix. The value of the target variable for each observation vector is output to a second dataset.
-
公开(公告)号:US10127477B2
公开(公告)日:2018-11-13
申请号:US15686863
申请日:2017-08-25
Applicant: SAS Institute Inc.
Abstract: A computing device predicts occurrence of an event or classifies an object using distributed unlabeled data. Supervised data that includes a labeled subset of a plurality of observation vectors is identified. A total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors is determined. The identified supervised data is uploaded to each thread of the total number of threads. Unlabeled observation vectors are randomly select from the unlabeled subset of the plurality of observation vectors to allocate to each thread of the total number of threads. The randomly selected, unlabeled observation vectors are uploaded to each thread of the total number of threads based on the allocation. The value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors is determined based on a converged classification matrix and output to a labeled dataset.
-
公开(公告)号:US20170308810A1
公开(公告)日:2017-10-26
申请号:US15335530
申请日:2016-10-27
Applicant: SAS Institute Inc.
CPC classification number: G06N99/005 , G06N5/003 , G06N7/005
Abstract: A computing device predicts occurrence of an event or classifies an object using semi-supervised data. A label set defines permissible values for a target variable. A value of the permissible values is defined for a subset of observation vectors. A predefined number of times, a distance matrix is computed that defines a distance value between pairs of observation vectors using a distance function and a converged classification matrix; a number of observation vectors is selected that have minimum values for the distance value; a label is requested and a response is received for each of the selected observation vectors; the value of the target variable is updated for each of the selected observation vectors with the received response; and the value of the target variable is determined again by recomputing the converged classification matrix. The value of the target variable for each observation vector is output to a second dataset.
-
-
-
-
-
-
-