PROBABILISTIC DATA MINING MODEL COMPARISON ENGINE
    1.
    发明申请
    PROBABILISTIC DATA MINING MODEL COMPARISON ENGINE 审中-公开
    概率数据挖掘模型比较引擎

    公开(公告)号:WO2012045496A3

    公开(公告)日:2012-09-07

    申请号:PCT/EP2011062076

    申请日:2011-07-14

    CPC classification number: G06F17/18 G06K9/62

    Abstract: Comparison engine for comparing a first data mining model and a second data mining model is disclosed. A first data mining model M1 represents results of a first data mining task on a first data set D1 and provides a set of first prediction values. A second data mining model M2 represents results of a second data mining task on a second data set D2 and provides a set of second prediction values. A relation R is determined between said sets of prediction values. For at least a first record of an input data set, a first and second probability distribution is created based on the first and second data mining models applied to the first record, said probability distributions associating probabilities with said sets of prediction values.A distance measure d is calculated for said first record using the first and second probability distributions and the relation. At least one region of interest is determined based on said distance measure d.

    Abstract translation: 公开了用于比较第一数据挖掘模型和第二数据挖掘模型的比较引擎。 第一数据挖掘模型M1表示第一数据集D1上的第一数据挖掘任务的结果并提供一组第一预测值。 第二数据挖掘模型M2表示第二数据集D2上的第二数据挖掘任务的结果并提供一组第二预测值。 在所述多组预测值之间确定关系R. 对于输入数据集的至少第一记录,基于应用于第一记录的第一和第二数据挖掘模型来创建第一和第二概率分布,所述概率分布将概率与所述组预测值相关联。 使用第一和第二概率分布和关系为所述第一记录计算d。 基于所述距离度量d来确定至少一个感兴趣区域。

    METHOD AND SYSTEM FOR PREDICTIVE MODELING
    2.
    发明申请
    METHOD AND SYSTEM FOR PREDICTIVE MODELING 审中-公开
    预测模型的方法和系统

    公开(公告)号:WO2012084320A3

    公开(公告)日:2013-01-10

    申请号:PCT/EP2011069333

    申请日:2011-11-03

    CPC classification number: G06N7/005 G06F17/18 G06K9/6256 G06K9/6277

    Abstract: A method (100) for carrying out a predictive analysis is provided which generates a predictive model (Padj (Y | X)) based on two separate pieces of information, namely - a set of original training data (Dorig), and - a "true" distribution of indicators (Ptrue(X)). The method (100) begins by generating a base model distribution (Pgen(Y | X)) from the original training data set (Dorig) containing tuples (x, y) of indicators (x) and corresponding labels (y) (step 120). Using the "true" distribution (Ptrue(X)) of indicators, a random data set (D') of indicator records (x) is generated reflecting this "true" distribution (Ptrue(X)) (step 140). Subsequently, the base model (Pgen(Y | X)) is applied to said random data set (D'), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D') and generating an adjusted training set (Dadj) (step 150). Finally, an adjusted predictive model (Padj (Y | X)) is trained based on said adjusted training set (Dadj) (step 160).

    Abstract translation: 提供一种用于执行预测分析的方法(100),其基于两个单独的信息生成预测模型(Padj(Y | X)),即,一组原始训练数据(Dorig),以及 - “ 真实“指标分布(Ptrue(X))。 方法(100)从包含指示符(x)和对应标签(y)的元组(x,y))的原始训练数据集(Dorig)生成基本模型分布(Pgen(Y | X))(步骤120 )。 使用指示符的“真”分布(Ptrue(X)),生成反映该“真”分布(Ptrue(X))的指示符记录(x)的随机数据集(D')(步骤140)。 随后,将基本模型(Pgen(Y | X))应用于所述随机数据集(D'),从而将标签(y)或标签分布分配给所述随机数据集中的每个指示符记录(x) D')并生成调整训练集(Dadj)(步骤150)。 最后,基于所述调整训练集(Dadj)来训练经调整的预测模型(Padj(Y | X))(步骤160)。

    Schätzen von Rechenressourcen für die Ausführung von Data-Mining-Diensten

    公开(公告)号:DE112016001902T5

    公开(公告)日:2018-01-04

    申请号:DE112016001902

    申请日:2016-05-19

    Applicant: IBM

    Abstract: Die Rechenressourcen für die Ausführung einer Data-Mining-Aufgabe über ein verteiltes Datenverarbeitungssystem werden geschätzt. Der Datensatz, auf dessen Grundlage die Data-Mining-Aufgabe durchgeführt wird, und/oder Datendeskriptoren, die Merkmale des Datensatzes beschreiben oder begrenzen, welche relevant sind, werden empfangen. Ein oder mehrere Steuerwerte für die Data-Mining-Aufgabe und zusätzlich ein oder mehrere Aufgabenparameter, die die Data-Mining-Aufgabe angeben, werden empfangen. Die Rechenressourcen, um die Data-Mining-Aufgabe über das verteilte Datenverarbeitungssystem auf der Grundlage des empfangenen Datensatzes oder der empfangenen Datendeskriptoren, des einen oder der mehreren Steuerwerte und des einen oder der mehreren Aufgabenparameter durchzuführen, werden geschätzt.

    Method and system for predictive modeling

    公开(公告)号:GB2515056A

    公开(公告)日:2014-12-17

    申请号:GB201310453

    申请日:2011-11-03

    Applicant: IBM

    Abstract: A method (100) for carrying out a predictive analysis is provided which generates a predictive model (Padj (Y | X)) based on two separate pieces of information, namely - a set of original training data (Dorig), and - a "true" distribution of indicators (Ptrue(X)). The method (100) begins by generating a base model distribution (Pgen(Y | X)) from the original training data set (Dorig) containing tuples (x, y) of indicators (x) and corresponding labels (y) (step 120). Using the "true" distribution (Ptrue(X)) of indicators, a random data set (D') of indicator records (x) is generated reflecting this "true" distribution (Ptrue(X)) (step 140). Subsequently, the base model (Pgen(Y | X)) is applied to said random data set (D'), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D') and generating an adjusted training set (Dadj) (step 150). Finally, an adjusted predictive model (Padj (Y | X)) is trained based on said adjusted training set (Dadj) (step 160).

Patent Agency Ranking