Abstract:
An apparatus includes processor component caused to: retrieve metadata of organization of data within a data set, and map data of organization of data blocks within a data file; receive indications of which node devices are available to perform a processing task with a data set portion; and in response to the data set including partitioned data, compare the quantities of available node devices and of the node devices last involved in storing the data set. In response to a match, for each map data map entry: retrieve a hashed identifier for a data sub-block, and a size for each of the data sub-blocks within the corresponding data block; divide the hashed identifier by the quantity of available node devices; compare the modulo value to a designation assigned to each of the available node devices; and provide a pointer to the available node device assigned the matching designation.
Abstract:
A computer-program causing a computing device to perform an association measurement between a target variable and each non-target variable of a data set; select non-target variables for inclusion in a visualization based on the degree of association; perform correspondence analysis between target values of the target variable and non-target values of each selected non-target variable; order target value markers within a target row based on the degrees of closeness; order non-target value markers within each non-target row based on the degrees of closeness; determine a width of each target value marker based on a frequency of occurrence of its target value in the data set; determine a width of each non-target value marker based on a frequency of occurrence of its non-target value in the data set; and cause generation of the visualization with connection markers emanating from the target value markers and extending among the non-target value markers.
Abstract:
Exemplary embodiments are generally directed to methods, mediums, and systems for correcting censored or constrained historical data with various possible types of computing devices, including cloud-based devices, personal computing devices, and edge-based devices. The corrected data may be used in forecasting, for example to forecast demand for a limited resource. In some embodiments, the data is modeled at a higher level of granularity than an individual record. The aggregated demand may then be pro-rated over a group of categories or users where a given category of users that might be small or nonexistent over a certain time frame may be better accommodated. Moreover, it may be easier or more efficient to make assumptions and employ computing resources at the aggregate level.
Abstract:
Disclosed are methods, system, and computer program products useful for generating summary statistics for data predictions based on the aggregation of data from past time intervals. Summary statistics such as prediction standard errors, variances, confidence limits, and other statistical measures, may be generated in a way that preserves the basic distributional properties of the original data sets, to allow, for example, a reduction of the multiple data sets through the aggregation process, which may be useful for a prediction process, while determining statistical information for the predicted data.
Abstract:
Information related to a time series can be predicted. For example, a repetitive characteristic of the time series can be determined by analyzing the time series for a pattern that repeats over a predetermined time period. An adjusted time series can be generated by removing the repetitive characteristic from the time series. An effect of a moving event on the adjusted time series can be determined. The moving event can occur on different dates for two or more consecutive years. A residual time series can be generated by removing the effect of the moving event from the adjusted time series. A base forecast that is independent of the repetitive characteristic and the effect of the moving event can be generated using the residual time series. A predictive forecast can be generated by including the repetitive characteristic and the effect of the moving event into the base forecast.
Abstract:
Various embodiments include a system having interfaces, storage devices, memory, and processing circuitry. The system may include logic to render a portion of a first layer and a portion of a second layer for presentation, determine parameters of tokens for the second layer based a result of the rendering of the second layer, the parameters to include at least one of token width values, token offset values, line height values, and line top values. The system also to align the first layer and the second layer based on the parameters of the tokens for the second layer, and present the first layer and the second layer on a display, the first layer to present tokens and the second layer to receive events.
Abstract:
An apparatus includes a renaming component to homogenized query instructions for retrieving data items from a data set organized using index labels by identifying a declaration instruction associating an object thereof with an index label, replacing the name provided to the object the with an archetypal name based on the index label, and generating change data associating the name with the archetypal name; a hashing component to take an instruction hash of the homogenized instructions; a cache control routine to find a matching instruction hash corresponding to results of earlier database queries in a results cache; and a reversal routine to, in response finding a matching instruction hash, retrieve a cached result from the results cache associated with the matching instruction hash, and replace a name of a different object therein based on the change data and the query instructions to generate a new result of the new database query.
Abstract:
A computing device predicts a probability of a transformer failure. An analysis type indicator defined by a user is received. A worth value for each of a plurality of variables is computed. Highest worth variables from the plurality of variables are selected based on the computed worth values. A number of variables of the highest worth variables is limited to a predetermined number based on the received analysis type indicator. A first model and a second model are also selected based on the received analysis type indicator. Historical electrical system data is partitioned into a training dataset and a validation dataset that are used to train and validate, respectively, the first model and the second model. A probability of failure model is selected as the first model or the second model based on a comparison between a fit of each model.
Abstract:
Electronic communications can be normalized using a neural network. For example, a noncanonical communication that includes multiple terms can be received. The noncanonical communication can be preprocessed by (I) generating a vector including multiple characters from a term of the multiple terms; and (II) repeating a substring of the term in the vector such that a last character of the substring is positioned in a last position in the vector. The vector can be transmitted to a neural network configured to receive the vector and generate multiple probabilities based on the vector. A normalized version of the noncanonical communication can be determined using one or more of the multiple probabilities generated by the neural network. Whether the normalized version of the noncanonical communication should be outputted can also be determined using at least one of the multiple probabilities generated by the neural network.
Abstract:
An apparatus includes a processor and storage to store instructions that cause the processor to identify at least one correlation between a diagnosis group and a medication class for each patient of a first set of patients to derive a set of models for each diagnosis group that correlates the diagnosis group to at least one medication class based on the at least one identified correlation; and for each patient of a second set of patients, employ each model of each set of models to make at least one prediction of at least one diagnosis group as indicated in the corresponding diagnosis group record based on at least one medication class indicated in the corresponding medication class record, and compare the at least one prediction to the corresponding diagnosis group record to derive a tally of at least one of true positives or false positives for each prediction.