-
公开(公告)号:WO2012045496A3
公开(公告)日:2012-09-07
申请号:PCT/EP2011062076
申请日:2011-07-14
Applicant: IBM , LINGENFELDER CHRISTOPH , WURST MICHAEL , POMPEY PASCAL
Inventor: LINGENFELDER CHRISTOPH , WURST MICHAEL , POMPEY PASCAL
Abstract: Comparison engine for comparing a first data mining model and a second data mining model is disclosed. A first data mining model M1 represents results of a first data mining task on a first data set D1 and provides a set of first prediction values. A second data mining model M2 represents results of a second data mining task on a second data set D2 and provides a set of second prediction values. A relation R is determined between said sets of prediction values. For at least a first record of an input data set, a first and second probability distribution is created based on the first and second data mining models applied to the first record, said probability distributions associating probabilities with said sets of prediction values.A distance measure d is calculated for said first record using the first and second probability distributions and the relation. At least one region of interest is determined based on said distance measure d.
Abstract translation: 公开了用于比较第一数据挖掘模型和第二数据挖掘模型的比较引擎。 第一数据挖掘模型M1表示第一数据集D1上的第一数据挖掘任务的结果并提供一组第一预测值。 第二数据挖掘模型M2表示第二数据集D2上的第二数据挖掘任务的结果并提供一组第二预测值。 在所述多组预测值之间确定关系R. 对于输入数据集的至少第一记录,基于应用于第一记录的第一和第二数据挖掘模型来创建第一和第二概率分布,所述概率分布将概率与所述组预测值相关联。 使用第一和第二概率分布和关系为所述第一记录计算d。 基于所述距离度量d来确定至少一个感兴趣区域。
-
公开(公告)号:WO2012084320A3
公开(公告)日:2013-01-10
申请号:PCT/EP2011069333
申请日:2011-11-03
Applicant: IBM , LINGENFELDER CHRISTOPH , WURST DR MICHAEL , POMPEY PASCAL
Inventor: LINGENFELDER CHRISTOPH , WURST DR MICHAEL , POMPEY PASCAL
CPC classification number: G06N7/005 , G06F17/18 , G06K9/6256 , G06K9/6277
Abstract: A method (100) for carrying out a predictive analysis is provided which generates a predictive model (Padj (Y | X)) based on two separate pieces of information, namely - a set of original training data (Dorig), and - a "true" distribution of indicators (Ptrue(X)). The method (100) begins by generating a base model distribution (Pgen(Y | X)) from the original training data set (Dorig) containing tuples (x, y) of indicators (x) and corresponding labels (y) (step 120). Using the "true" distribution (Ptrue(X)) of indicators, a random data set (D') of indicator records (x) is generated reflecting this "true" distribution (Ptrue(X)) (step 140). Subsequently, the base model (Pgen(Y | X)) is applied to said random data set (D'), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D') and generating an adjusted training set (Dadj) (step 150). Finally, an adjusted predictive model (Padj (Y | X)) is trained based on said adjusted training set (Dadj) (step 160).
Abstract translation: 提供一种用于执行预测分析的方法(100),其基于两个单独的信息生成预测模型(Padj(Y | X)),即,一组原始训练数据(Dorig),以及 - “ 真实“指标分布(Ptrue(X))。 方法(100)从包含指示符(x)和对应标签(y)的元组(x,y))的原始训练数据集(Dorig)生成基本模型分布(Pgen(Y | X))(步骤120 )。 使用指示符的“真”分布(Ptrue(X)),生成反映该“真”分布(Ptrue(X))的指示符记录(x)的随机数据集(D')(步骤140)。 随后,将基本模型(Pgen(Y | X))应用于所述随机数据集(D'),从而将标签(y)或标签分布分配给所述随机数据集中的每个指示符记录(x) D')并生成调整训练集(Dadj)(步骤150)。 最后,基于所述调整训练集(Dadj)来训练经调整的预测模型(Padj(Y | X))(步骤160)。
-
公开(公告)号:DE112016001902T5
公开(公告)日:2018-01-04
申请号:DE112016001902
申请日:2016-05-19
Applicant: IBM
Inventor: MARECEK JAKUB , MAVROEIDIS DIMITRIOS , WURST MICHAEL , POMPEY PASCAL
IPC: G06F17/30
Abstract: Die Rechenressourcen für die Ausführung einer Data-Mining-Aufgabe über ein verteiltes Datenverarbeitungssystem werden geschätzt. Der Datensatz, auf dessen Grundlage die Data-Mining-Aufgabe durchgeführt wird, und/oder Datendeskriptoren, die Merkmale des Datensatzes beschreiben oder begrenzen, welche relevant sind, werden empfangen. Ein oder mehrere Steuerwerte für die Data-Mining-Aufgabe und zusätzlich ein oder mehrere Aufgabenparameter, die die Data-Mining-Aufgabe angeben, werden empfangen. Die Rechenressourcen, um die Data-Mining-Aufgabe über das verteilte Datenverarbeitungssystem auf der Grundlage des empfangenen Datensatzes oder der empfangenen Datendeskriptoren, des einen oder der mehreren Steuerwerte und des einen oder der mehreren Aufgabenparameter durchzuführen, werden geschätzt.
-
公开(公告)号:GB2515056A
公开(公告)日:2014-12-17
申请号:GB201310453
申请日:2011-11-03
Applicant: IBM
Inventor: LINGENFELDER CHRISTOPH , WURST MICHAEL , POMPEY PASCAL
Abstract: A method (100) for carrying out a predictive analysis is provided which generates a predictive model (Padj (Y | X)) based on two separate pieces of information, namely - a set of original training data (Dorig), and - a "true" distribution of indicators (Ptrue(X)). The method (100) begins by generating a base model distribution (Pgen(Y | X)) from the original training data set (Dorig) containing tuples (x, y) of indicators (x) and corresponding labels (y) (step 120). Using the "true" distribution (Ptrue(X)) of indicators, a random data set (D') of indicator records (x) is generated reflecting this "true" distribution (Ptrue(X)) (step 140). Subsequently, the base model (Pgen(Y | X)) is applied to said random data set (D'), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D') and generating an adjusted training set (Dadj) (step 150). Finally, an adjusted predictive model (Padj (Y | X)) is trained based on said adjusted training set (Dadj) (step 160).
-
公开(公告)号:DE112011104487T5
公开(公告)日:2013-10-17
申请号:DE112011104487
申请日:2011-11-03
Applicant: IBM
Inventor: WURST MICHAEL , LINGENFELDER CHRISTOPH , POMPEY PASCAL
Abstract: Es wird ein Verfahren (100) zum Ausführen einer pädiktiven Analyse bereitgestellt, das ein prädiktives Modell
-
-
-
-