Abstract:
Systems and methods are provided for performing data mining and statistical learning techniques on a big data set. More specifically, systems and methods are provided for linear regression using safe screening techniques. Techniques may include receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop an improved prediction hierarchy. It may include pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes tasks performed in parallel using a grid-enabled computing environment. For each time series, the system may determine a classification for the individual time series, a pattern group for the individual time series, and a level of the prediction hierarchy at which the each individual time series comprises an need output amount greater than a threshold amount. The computing system may generate an additional prediction hierarchy using the first prediction hierarchy, the classification, the pattern group, and the level.
Abstract:
Systems and methods are provided for performing data mining and statistical learning techniques on a big data set. More specifically, systems and methods are provided for linear regression using safe screening techniques. Techniques may include receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop an improved prediction hierarchy. It may include pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes tasks performed in parallel using a grid-enabled computing environment. For each time series, the system may determine a classification for the individual time series, a pattern group for the individual time series, and a level of the prediction hierarchy at which the each individual time series comprises an need output amount greater than a threshold amount. The computing system may generate an additional prediction hierarchy using the first prediction hierarchy, the classification, the pattern group, and the level.
Abstract:
Systems and methods are provided for performing data mining and statistical learning techniques on a big data set. More specifically, systems and methods are provided for linear regression using safe screening techniques. Techniques may include receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop an improved prediction hierarchy. It may include pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes tasks performed in parallel using a grid-enabled computing environment. For each time series, the system may determine a classification for the individual time series, a pattern group for the individual time series, and a level of the prediction hierarchy at which the each individual time series comprises an need output amount greater than a threshold amount. The computing system may generate an additional prediction hierarchy using the first prediction hierarchy, the classification, the pattern group, and the level.
Abstract:
A pipeline system for time-series data forecasting using a distributed computing environment is disclosed herein. In one example, a pipeline for forecasting time series is generated. The pipeline represents a sequence of operations for processing the time series to produce modeling results such as forecasts of the time series. The pipeline includes a segmentation operation for categorizing the time series into multiple demand classes based on demand characteristics of the time series. The pipeline also includes multiple sub-pipelines corresponding to the multiple demand classes. Each of the sub-pipelines applies a model strategy to the time series in the corresponding demand class. The model strategy is selected from multiple candidate model strategies based on predetermined relationships between the demand classes and the candidate model strategies. The pipeline is executed to determine the modeling results for the time series.
Abstract:
A hierarchical structure (e.g., a hierarchy) for use in hierarchical analysis (e.g., hierarchical forecasting) of timestamped data can be automatically generated. This automated approach to determining a hierarchical structure involves identifying attributes of the timestamped data, clustering the timestamped data to select attributes for the hierarchy, ordering the attributes to achieve a recommended hierarchical order, and optionally modifying the hierarchical order based on user input. Through the approach disclosed herein, a hierarchy can be generated that is designed to perform well under hierarchical models. This recommended hierarchy for use in hierarchical analysis may be agnostic to any planned hierarchy provided by or used by a user to otherwise interpret the timestamped data.
Abstract:
Computer-implemented systems and methods are provided for predicting outputs. Global output fractions associated with an object are approximated. Outputs for a group are predicted based upon a cyclical aspect component and a movement prediction. An output prediction is calculated based upon the predicted outputs for a related object group and the approximated global output fraction for a particular object.
Abstract:
A hierarchical structure (e.g., a hierarchy) for use in hierarchical analysis (e.g., hierarchical forecasting) of timestamped data can be automatically generated. This automated approach to determining a hierarchical structure involves identifying attributes of the timestamped data, clustering the timestamped data to select attributes for the hierarchy, ordering the attributes to achieve a recommended hierarchical order, and optionally modifying the hierarchical order based on user input. Through the approach disclosed herein, a hierarchy can be generated that is designed to perform well under hierarchical models. This recommended hierarchy for use in hierarchical analysis may be agnostic to any planned hierarchy provided by or used by a user to otherwise interpret the timestamped data.
Abstract:
In some examples, a processing device can receive prediction data representing a prediction. The processing device can also receive files defining abnormal data-point patterns to be identified in the prediction data. The processing device can identify at least one abnormal data-point pattern in the prediction data by executing customizable program-code in the files. The processing device can determine an override process that corresponds to the at least one abnormal data-point pattern in response to identifying the at least one abnormal data-point pattern in the prediction data. The processing device can execute the override process to generate a corrected version of the prediction data. The processing device can then adjust one or more computer parameters based on the corrected version of the prediction data.
Abstract:
Systems and methods are provided for performing data mining and statistical learning techniques on a big data set. More specifically, systems and methods are provided for linear regression using safe screening techniques. Techniques may include receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop an improved prediction hierarchy. It may include pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes tasks performed in parallel using a grid-enabled computing environment. For each time series, the system may determine a classification for the individual time series, a pattern group for the individual time series, and a level of the prediction hierarchy at which the each individual time series comprises an need output amount greater than a threshold amount. The computing system may generate an additional prediction hierarchy using the first prediction hierarchy, the classification, the pattern group, and the level.
Abstract:
Disclosed are methods, system, and computer program products useful for generating summary statistics for data predictions based on the aggregation of data from past time intervals. Summary statistics such as prediction standard errors, variances, confidence limits, and other statistical measures, may be generated in a way that preserves the basic distributional properties of the original data sets, to allow, for example, a reduction of the multiple data sets through the aggregation process, which may be useful for a prediction process, while determining statistical information for the predicted data.