-
1.
公开(公告)号:US20250156467A1
公开(公告)日:2025-05-15
申请号:US18812637
申请日:2024-08-22
Applicant: SAS INSTITUTE INC.
Inventor: Teresa S. Jade , Julia Moreno , Ashley Mary Beck
IPC: G06F16/35 , G06F16/34 , G06F40/284
Abstract: A computer-implemented system, computer-implemented method, and computer-program product includes obtaining a text document that includes text describing an action; extracting one or more action tokens from the text document; executing a plurality of linguistic pattern searches that search the text document for one or more likelihood tokens associated with the one or more action tokens; classifying the action to a likelihood category associated with a respective linguistic pattern search of the plurality of linguistic pattern searches that identified the one or more likelihood tokens; classifying the text document to a respective domain; computing a priority value of the action described in the text document based on an input of the likelihood category and the respective domain; and generating a priority summary artifact that visually prioritizes the text document over one or more other text documents when the priority value of the action satisfies a predefined maximum priority threshold value.
-
公开(公告)号:US12298963B1
公开(公告)日:2025-05-13
申请号:US18932008
申请日:2024-10-30
Applicant: SAS Institute Inc.
Inventor: Hongtao Hu , Mahesh V Joshi
IPC: G06F16/23 , G06F7/08 , G06F16/22 , G06F16/2455
Abstract: A new value is written from a dataset to a data structure comprising a set of sorted values. The new value replaces an oldest value and is inserted in a sorted position. The data structure is modified by subtracting a median value from each value of the set of sorted values to obtain sorted signed deviation values. The sorted signed deviation values are segmented to obtain data substructures comprising subsets of sorted absolute deviation values. A binary search is performed on the data substructures to identify a median absolute deviation value. A difference is computed between a particular value and the median value, and based on whether the difference is less than a threshold value computed from the median absolute deviation value, an outlier decision output is generated indicative of whether the particular value comprises an outlier value.
-
公开(公告)号:US12277409B1
公开(公告)日:2025-04-15
申请号:US18895119
申请日:2024-09-24
Applicant: SAS INSTITUTE INC.
Inventor: Samuel Paul Leeman-Munk , Xiaozhuo Cheng , Xiaolong Li
IPC: G06F11/36 , G06F8/35 , G06F11/3604
Abstract: A system, method, and computer-program product includes identifying a plurality of code synthesis items for a target programming language, generating a code synthesis prompt based on a first sampling of the plurality of code synthesis items, synthesizing, via a large language model, a plurality of raw code segments using the code synthesis prompt, executing the plurality of raw code segments with a code interpreter associated with the target programming language, determining one or more valid code segments of the plurality of raw code segments that the code interpreter successfully executed, aggregating, via a second sampling, the one or more valid code segments into one or more validated code synthesis training samples, and training a code generation model using the one or more validated code synthesis training samples. User interfaces may be provided to allow target coding tasks to be specified via text or speech.
-
公开(公告)号:US12277144B2
公开(公告)日:2025-04-15
申请号:US18221684
申请日:2023-07-13
Applicant: SAS INSTITUTE INC.
Inventor: Nancy Anne Rausch , Ruth Oluwadamilola Akintunde , Brant Nathan Kay
IPC: G06F40/284 , G06F16/242 , G06F16/28
Abstract: A computer-implemented system includes identifying a target hierarchical taxonomy comprising a plurality of distinct hierarchical taxonomy categories; extracting a plurality of distinct taxonomy tokens from the plurality of distinct hierarchical taxonomy categories; computing a taxonomy vector corpus based on the plurality of distinct taxonomy tokens; computing a plurality of distinct taxonomy clusters based on an input of the taxonomy vector corpus; constructing a hierarchical taxonomy classifier based on the plurality of distinct taxonomy clusters; converting a volume of unlabeled structured datasets to a plurality of distinct corpora of taxonomy-labeled structured datasets based on the hierarchical taxonomy classifier; and outputting at least one corpus of taxonomy-labeled structured datasets of the plurality of distinct corpora of taxonomy-labeled structured datasets based on an input of a data classification query.
-
公开(公告)号:US20250117664A1
公开(公告)日:2025-04-10
申请号:US18765014
申请日:2024-07-05
Applicant: SAS INSTITUTE INC.
IPC: G06N3/0895 , G06N3/042
Abstract: A system, method, and computer-program product includes obtaining a decisioning dataset comprising a plurality of favorable decisioning records and at least one unfavorable decisioning record; detecting, via a machine learning algorithm, a favorable decisioning record of the plurality of favorable decisioning records that has a vector value closest to a vector value of the unfavorable decisioning record; executing a counterfactual assessment between the favorable decisioning record and the unfavorable decisioning record; generating an explainability artifact based on one or more bias intensity metrics to explain a bias in a machine learning-based decisioning model; and in response to generating the explainability artifact, displaying the explainability artifact in a user interface.
-
公开(公告)号:US20250117632A1
公开(公告)日:2025-04-10
申请号:US18764967
申请日:2024-07-05
Applicant: SAS INSTITUTE INC.
IPC: G06N3/0475 , G06F17/16 , G06N3/08
Abstract: A system, method, and computer-program product includes obtaining a decisioning dataset comprising a plurality of favorable decisioning records and at least one unfavorable decisioning record; detecting, via a machine learning algorithm, a favorable decisioning record of the plurality of favorable decisioning records that has a vector value closest to a vector value of the unfavorable decisioning record; executing a counterfactual assessment between the favorable decisioning record and the unfavorable decisioning record; generating an explainability artifact based on one or more bias intensity metrics to explain a bias in a machine learning-based decisioning model; and in response to generating the explainability artifact, displaying the explainability artifact in a user interface.
-
公开(公告)号:US20250068658A1
公开(公告)日:2025-02-27
申请号:US18941263
申请日:2024-11-08
Applicant: SAS Institute Inc.
Inventor: Kai Xu , Georgi Valentinov Ganev , Emile Isak Joubert , Rees Stephen Davison , Olivier Rene Maurice Van Acker , Luke Anthony William Robinson , Sofiane Mahiou
IPC: G06F16/28
Abstract: Embodiments described herein relate to the efficient generation of synthetic datasets that represent many-to-many relationships. In particular, certain embodiments implement a particular factorization for many-to-many generative models, which leads to a scalable generation framework by combining random graph theory and representation learning. Further embodiments we extend the framework to establish the notion of differential privacy within the synthetically generated data. The embodiments described herein are therefore able to generate synthetic datasets efficiently while preserving information within and across many-to-many datasets with improved accuracy.
-
公开(公告)号:US20250045611A1
公开(公告)日:2025-02-06
申请号:US18751509
申请日:2024-06-24
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
Abstract: A computing device learns a directed acyclic graph for a plurality of variables. (A) A target variable and zero or more input variables are defined based on a predefined topological order vector and a first index. (B) A machine learning model is trained with observation vectors using the target variable and the input variables. (C) The machine learning model is executed using the observation vectors with the target variable and the input variables to compute a residual vector. (D) The first index is incremented. (E) (A) through (D) are repeated a first plurality of times. A parent set is determined for each variable by comparing the residual vector computed each repetition of (C) to other residual vectors computed on other repetitions of (C). The parent set is output for each variable to describe a directed acyclic graph that defines a hierarchical relationship between the variables.
-
公开(公告)号:US20250045355A1
公开(公告)日:2025-02-06
申请号:US18751584
申请日:2024-06-24
Applicant: SAS Institute Inc.
Inventor: Xilong Chen , Tao Huang , Jan Chvosta
IPC: G06F17/18
Abstract: A computing device learns a directed acyclic graph (DAG). (A) A target variable is defined from variables based on a topological order vector and a first index. (B) Input variables are defined from the variables based on the topological order vector and a second index. (C) A machine learning model is trained with observation vectors using the target variable and the input variables. (D) The machine learning model is executed to compute a loss value. (E) The second index is incremented. (F) (B) through (E) are repeated a first plurality of times. (G) The first index is incremented. (H) (A) through (G) are repeated a second plurality of times. A parent set is determined for each variable based on a comparison between the loss value computed each repetition of (D). The parent set is output for each variable to describe the DAG that defines a hierarchical relationship between the variables.
-
公开(公告)号:US12175374B1
公开(公告)日:2024-12-24
申请号:US18635410
申请日:2024-04-15
Applicant: SAS Institute Inc.
Inventor: Yingjian Wang , Xinmin Wu
Abstract: A computing system trains a classification model using distributed training data. A first worker index and a second worker index are received from a controller device and together uniquely identify a segment of a lower triangular matrix. The first and second worker indices have values from one to a predefined block size value. In response to receipt of a first computation request from the controller device, a first kernel matrix block is computed at each computing device based on the first worker index and the second worker index. In response to receipt of a second computation request from the controller device, an objective function value is computed for each observation vector included in an accessed training data subset. The computed objective function value is sent to the controller device. Model parameters for a trained classification model are output.
-
-
-
-
-
-
-
-
-