-
公开(公告)号:US10127477B2
公开(公告)日:2018-11-13
申请号:US15686863
申请日:2017-08-25
Applicant: SAS Institute Inc.
Abstract: A computing device predicts occurrence of an event or classifies an object using distributed unlabeled data. Supervised data that includes a labeled subset of a plurality of observation vectors is identified. A total number of threads that will perform labeling of an unlabeled subset of the plurality of observation vectors is determined. The identified supervised data is uploaded to each thread of the total number of threads. Unlabeled observation vectors are randomly select from the unlabeled subset of the plurality of observation vectors to allocate to each thread of the total number of threads. The randomly selected, unlabeled observation vectors are uploaded to each thread of the total number of threads based on the allocation. The value of the target variable for each observation vector of the unlabeled subset of the plurality of observation vectors is determined based on a converged classification matrix and output to a labeled dataset.
-
公开(公告)号:US20180300650A1
公开(公告)日:2018-10-18
申请号:US15876723
申请日:2018-01-22
Applicant: SAS Institute Inc.
Inventor: Biruk Gebremariam
Abstract: A computing system provides analysis of data and grouping of variables in support of analytics. From a plurality of observation vectors read from a dataset, a number of observations having a non-missing value and a cardinality value are computed for each variable of the variables. For each variable of the variables, the cardinality ratio value is compared to a first policy parameter value, and the respective variable is identified as a nominal variable type or as an interval variable type based on the comparison. For each variable of the variables identified as the nominal variable type, the cardinality value of the respective variable is compared to a second policy parameter value, and the respective variable is identified as the high-cardinality nominal variable type or as a non-high-cardinality nominal variable type based on the comparison with the cardinality value. The identified variable type is output for each variable of the variables.
-
公开(公告)号:US10095693B2
公开(公告)日:2018-10-09
申请号:US14709601
申请日:2015-05-12
Applicant: SAS Institute Inc.
Inventor: James Edward Georges , Dan Kelly , Jin-Whan Jung , John Clarke Brocklebank , Adheesha Sanjaya Arangala , Julius Alton King
Abstract: An apparatus includes a communications component to receive a specified variable and one or more specified criteria to select a final clustered representation of a network, the specified criteria including a maximum degree of loss of information for the specified variable for the final clustered representation; and an iterative collapse component to perform iteration(s) of deriving the final clustered representation. Each iteration includes calculating the degree of loss from each possible combination of two linked nodes of a current clustered representation to generate a next clustered representation; selecting the possible combination associated with a smallest degree of loss; determining whether to cease iterations based on whether the smallest degree associated with the selected combination exceeds the maximum degree; effecting the selected combination if the smallest degree doesn't exceed the maximum degree; and selecting the current clustered representation as the final clustered representation if the smallest degree exceeds the maximum degree.
-
104.
公开(公告)号:US20180239740A1
公开(公告)日:2018-08-23
申请号:US15894002
申请日:2018-02-12
Applicant: SAS Institute Inc.
Inventor: Wei Xiao , Jorge Manuel Gomes da Silva , Saba Emrani , Arin Chaudhuri
CPC classification number: G06K9/00771 , G06F9/30036 , G06F17/16 , G06F17/18 , G06K9/481 , G06K9/623 , G06K9/6232 , G06K9/6247 , G06K9/6249 , G06K2009/3291
Abstract: A computing device detects an abnormal observation vector using a principal components decomposition. The principal components decomposition includes a sparse noise vector st computed for the observation vector that includes a plurality of values, wherein each value is associated with a variable to define a plurality of variables. The sparse noise vector st has a dimension equal to m a number of the plurality of variables. A zero counter time series value ĉt is computed using ĉt=Σi=1mst[i]. A probability value for ĉt is computed using p=Σi=ĉt+1m+1Hc[i]/Σi=0m+1Hc[i], where Hc[i] includes a count of a number of times each value of ĉt occurred for previous observation vectors. The probability value is compared with a predefined abnormal observation probability value. An abnormal observation indicator is set when the probability value indicates the observation vector is abnormal. The observation vector is output when the probability value indicates the observation vector is abnormal.
-
公开(公告)号:US10044505B2
公开(公告)日:2018-08-07
申请号:US15677683
申请日:2017-08-15
Applicant: SAS Institute Inc.
Inventor: Gang Meng
Abstract: A node in a distributed computing environment can generate key-value pairs. The node can categorize the key-value pairs into bins, with each key-value pair being categorized into a bin spanning a range of hashed keys that includes a hashed key of the key-value pair. The node can determine nodes in the distributed computing environment that are mapped to the bins. The node can distribute each key-value pair to a node corresponding to a bin into which the key-value pair was categorized. The node can then sort any of the key-value pairs maintained on the node by hashed key or key to generate sorted key-value pairs. The node can assign index values to the sorted key-value pairs. The indexed key-value pairs may be the same each time the above process is run, regardless of the underlying topology of the distributed computing environment. This can result in stable data-processing.
-
公开(公告)号:US10025753B2
公开(公告)日:2018-07-17
申请号:US15890013
申请日:2018-02-06
Applicant: SAS Institute Inc.
Inventor: Michael James Leonard , Edward Tilden Blair , Jerzy Michal Brzezicki , Udo V. Sglavo , Sujatha Pothireddy
Abstract: Systems and methods are provided for analyzing unstructured time stamped data. A distribution of time-stamped data is analyzed to identify a plurality of potential time series data hierarchies for structuring the data. An analysis of a potential time series data hierarchy may be performed. The analysis of the potential time series data hierarchies may include determining an optimal time series frequency and a data sufficiency metric for each of the potential time series data hierarchies. One of the potential time series data hierarchies may be selected based on a comparison of the data sufficiency metrics. Multiple time series may be derived in a single-read pass according to the selected time series data hierarchy. A time series forecast corresponding to at least one of the derived time series may be generated.
-
公开(公告)号:US20180181541A1
公开(公告)日:2018-06-28
申请号:US15849870
申请日:2017-12-21
Applicant: SAS Institute Inc.
Inventor: Yonggang Yao
Abstract: A computing device computes a plurality of quantile regression solvers for a dataset at a plurality of quantile levels. Each observation vector includes an explanatory vector of a plurality of explanatory variable values and a response variable value. The read dataset is recursively divided into subsets of the plurality of observation vectors, a lower counterweight vector and an upper counterweight vector are computed for each of the subsets, and a quantile regression solver is fit to each of the subsets using the associated, computed lower counterweight vector and the associated, computed upper counterweight vector to describe a quantile function of the response variable values for a selected quantile level of the identified plurality of quantile level values. For each selected quantile level, a parameter estimate vector and a dual solution vector that describe the quantile function are output in association with the selected quantile level.
-
公开(公告)号:US20180181445A1
公开(公告)日:2018-06-28
申请号:US15896727
申请日:2018-02-14
Applicant: SAS Institute Inc.
Inventor: Henry Gabriel Victor Bequet , Huina Chen
CPC classification number: G06F9/5083 , G06F17/30949 , G06F17/30985
Abstract: An apparatus includes a processor to: receive, from a first remote device, a request to perform at least one iteration of a first job flow at least partly within a first federated area, wherein access to the first federated area is granted to the first remote device and not a second remote device, access to a second federated area is granted to the second remote device and not the first remote device, and a transfer area is maintained to transfer an object between the first and second federated areas; perform the at least one iteration of the first job flow; and analyze an output object generated in each iteration to determine whether a condition has been met to transfer an object from the first federated area to the transfer area to enable its transfer to the second federated area to enable its use in a second job flow.
-
公开(公告)号:US10002146B1
公开(公告)日:2018-06-19
申请号:US15838175
申请日:2017-12-11
Applicant: SAS Institute Inc.
Inventor: Brian Payton Bowman , Gordon Lyle Keener , Steven E. Krueger
CPC classification number: G06F16/2228 , G06F7/02 , G06F7/08 , G06F7/20 , G06F9/5027 , G06F9/5072 , G06F16/21 , G06F16/2246 , G06F16/2255 , G06F16/23 , G06F16/2365 , G06F16/245 , G06F16/27 , G06F16/381 , G06F16/9014 , G06F16/9027
Abstract: An apparatus including a processor to receive search criteria including a data value for a search within a data field; in response to the receipt of the query instructions, and for each data cell within a super cell, perform the specified search by comparing the data value to ranges of values indicated in a corresponding cell index to determine whether the data cell includes a data record meeting the search criteria, and in response to a determination that the data cell includes such a data record, use a unique values index in the cell index to search the data records of the data cell to identify one or more data records meeting the search criteria; and in response to identifying at least one data record meeting the search criteria, provide an indication that at least the data cell includes at least one data record meeting the search criteria.
-
公开(公告)号:US20180157620A1
公开(公告)日:2018-06-07
申请号:US15890019
申请日:2018-02-06
Applicant: SAS Institute Inc.
Inventor: Michael James Leonard , Edward Tilden Blair , Jerzy Michal Brzezicki , Udo V. Sglavo , Sujatha Pothireddy
CPC classification number: G06F17/10 , G06F17/18 , G06F17/30716 , G06Q10/04 , G06Q30/02
Abstract: Systems and methods are provided for analyzing unstructured time stamped data. A distribution of time-stamped data is analyzed to identify a plurality of potential time series data hierarchies for structuring the data. An analysis of a potential time series data hierarchy may be performed. The analysis of the potential time series data hierarchies may include determining an optimal time series frequency and a data sufficiency metric for each of the potential time series data hierarchies. One of the potential time series data hierarchies may be selected based on a comparison of the data sufficiency metrics. Multiple time series may be derived in a single-read pass according to the selected time series data hierarchy. A time series forecast corresponding to at least one of the derived time series may be generated.
-
-
-
-
-
-
-
-
-