-
公开(公告)号:US20250053615A1
公开(公告)日:2025-02-13
申请号:US18905480
申请日:2024-10-03
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
IPC: G06F17/18
Abstract: A computing device learns a directed acyclic graph (DAG). (A) A target variable is defined from variables based on a topological order vector and a first index. (B) Input variables are defined from the variables based on the topological order vector and a second index. (C) A machine learning model is trained with observation vectors using the target variable and the input variables. (D) The machine learning model is executed to compute a loss value. (E) The second index is incremented. (F) (B) through (E) are repeated a first plurality of times. (G) The first index is incremented. (H) (A) through (G) are repeated a second plurality of times. A parent set is determined for each variable based on a comparison between the loss value computed each repetition of (D). The parent set is output for each variable to describe the DAG that defines a hierarchical relationship between the variables.
-
公开(公告)号:US11120032B1
公开(公告)日:2021-09-14
申请号:US17209752
申请日:2021-03-23
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Xunlei Wu , Jan Chvosta
IPC: G06F16/2458 , G06F16/2453
Abstract: Computing resources consumed in performing computerized sequence-mining can be reduced by implementing some examples of the present disclosure. In one example, a system can determine weights for data entries in a data set and then select a group of data entries from the data set based on the weights. Next, the system can determine a group of k-length sequences present in the selected group of data entries by applying a shuffling algorithm. The system can then determine frequencies corresponding to the group of k-length sequences and select candidate sequences from among the group of k-length sequences based on the frequencies thereof. Next, the system can determine support values corresponding to the candidate sequences and then select output sequences from among the candidate sequences based on the support values thereof. The system may then transmit an output signal indicating the selected output sequences an electronic device.
-
公开(公告)号:US20250045611A1
公开(公告)日:2025-02-06
申请号:US18751509
申请日:2024-06-24
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
Abstract: A computing device learns a directed acyclic graph for a plurality of variables. (A) A target variable and zero or more input variables are defined based on a predefined topological order vector and a first index. (B) A machine learning model is trained with observation vectors using the target variable and the input variables. (C) The machine learning model is executed using the observation vectors with the target variable and the input variables to compute a residual vector. (D) The first index is incremented. (E) (A) through (D) are repeated a first plurality of times. A parent set is determined for each variable by comparing the residual vector computed each repetition of (C) to other residual vectors computed on other repetitions of (C). The parent set is output for each variable to describe a directed acyclic graph that defines a hierarchical relationship between the variables.
-
公开(公告)号:US20250045355A1
公开(公告)日:2025-02-06
申请号:US18751584
申请日:2024-06-24
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
IPC: G06F17/18
Abstract: A computing device learns a directed acyclic graph (DAG). (A) A target variable is defined from variables based on a topological order vector and a first index. (B) Input variables are defined from the variables based on the topological order vector and a second index. (C) A machine learning model is trained with observation vectors using the target variable and the input variables. (D) The machine learning model is executed to compute a loss value. (E) The second index is incremented. (F) (B) through (E) are repeated a first plurality of times. (G) The first index is incremented. (H) (A) through (G) are repeated a second plurality of times. A parent set is determined for each variable based on a comparison between the loss value computed each repetition of (D). The parent set is output for each variable to describe the DAG that defines a hierarchical relationship between the variables.
-
公开(公告)号:US09928320B2
公开(公告)日:2018-03-27
申请号:US15485577
申请日:2017-04-12
Applicant: SAS Institute Inc.
Inventor: Mahesh V. Joshi , Richard Potter , Jan Chvosta , Mark Roland Little
CPC classification number: G06F17/18 , G06F17/5009 , G06F2217/10 , G06Q40/08
Abstract: Techniques for estimated compound probability distribution are described herein. Embodiments may include receiving, at a master node of a distributed system, a compound model specification comprising frequency models, severity models, and one or more adjustment functions, wherein at least one model of the frequency models and the severity models depend on one or more regressor and distributing the compound model specification to worker nodes of the distributed system, each of the worker nodes to at least generate a portion of samples for use in predicting compound distribution model estimates. Embodiments may also include predicting the compound distribution model estimates based on the sample portions of aggregate values and adjusted aggregate values.
-
公开(公告)号:US11010451B2
公开(公告)日:2021-05-18
申请号:US14210259
申请日:2014-03-13
Applicant: SAS Institute Inc.
Inventor: Christian Macaro , Jan Chvosta , Mark Roland Little
Abstract: Techniques for automated Bayesian posterior sampling using Markov Chain Monte Carlo and related schemes are described. In an embodiment, one or more values in a stationarity phase for a system configured for Bayesian sampling may be initialized. Sampling may be performed in the stationarity phase based upon the one or more values to generate a plurality of samples. The plurality of samples may be evaluated based upon one or more stationarity criteria. The stationarity phase may be exited when the plurality of samples meets the one or more stationarity criteria. Other embodiments are described and claimed.
-
公开(公告)号:US10146741B2
公开(公告)日:2018-12-04
申请号:US14217858
申请日:2014-03-18
Applicant: SAS Institute Inc.
Inventor: Christian Macaro , Jan Chvosta , Mark Roland Little
Abstract: Various embodiments are directed to techniques for deriving a sample representation from a random sample. A computer-program product includes instructions to cause a first computing device to fit an empirical distribution function to a marginal probability distribution of a variable within a first sample portion of a random sample to derive a partial marginal probability distribution approximation, wherein the random sample is divided into multiple sample portions distributed among multiple computing devices; fit a first portion of a copula function to a multivariate probability distribution of the first sample portion, wherein the copula function is divided into multiple portions; and transmit an indication of a first likelihood contribution of the first sample portion to a coordinating device to cause a second computing device to fit a second portion of the copula function to a multivariate probability distribution of a second sample portion. Other embodiments are described and claimed.
-
公开(公告)号:US09665669B2
公开(公告)日:2017-05-30
申请号:US15197691
申请日:2016-06-29
Applicant: SAS Institute Inc.
Inventor: Mahesh V. Joshi , Richard Potter , Jan Chvosta , Mark Roland Little
CPC classification number: G06F17/18 , G06F17/5009 , G06F2217/10 , G06Q40/08
Abstract: Techniques for estimated compound probability distribution are described. An apparatus comprising a configuration component, perturbation component, sample generation controller, an aggregation component, a distribution fitting component, and statistics generation component. The configuration component operative to receive a compound model specification and candidate distribution definition. The perturbation component operative to generate a plurality of models from the compound model specification. The sample generation controller operative to initiate the generation of a plurality of compound model samples from each of the plurality of models. The distribution fitting component to generate parameter values for the candidate distribution definition based on the compound model samples. The statistics generation component to generate approximated aggregate statistics.
-
公开(公告)号:US20250045263A1
公开(公告)日:2025-02-06
申请号:US18538066
申请日:2023-12-13
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
Abstract: A computing device learns a best topological order vector for a plurality of variables. (A) A topological order vector is defined. (B) A target variable and zero or more input variables are defined based on the topological order vector. (C) A machine learning model is trained with observation vectors using values of the target variable and the zero or more input variables. (D) The machine learning model is executed with second observation vectors using the values of the target variable and the zero or more input variables to compute a loss value. (E) (A) through (D) are repeated a plurality of times. Each topological order vector defined in (A) is unique in comparison to other topological order vectors defined in (A). The best topological order vector is determined based on a comparison between the loss values computed for each topological order vector in (D).
-
公开(公告)号:US12056207B1
公开(公告)日:2024-08-06
申请号:US18538070
申请日:2023-12-13
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
IPC: G06F17/18
CPC classification number: G06F17/18
Abstract: A computing device learns a best topological order vector of a plurality of variables. A target variable and zero or more input variables are defined. (A) A machine learning model is trained with observation vectors using the target variable and the zero or more input variables. (B) The machine learning model is executed to compute an equation loss value. (C) The equation loss value is stored with the identifier. (D) The identifier is incremented. (E) (A) through (D) are repeated a plurality of times. (F) A topological order vector is defined. (G) A loss value is computed from a subset of the stored equation loss values based on the topological order vector. (F) through (G) are repeated for each unique permutation of the topological order vector. A best topological order vector is determined based on a comparison between the loss value computed for each topological order vector in (G).
-
-
-
-
-
-
-
-
-