Abstract:
PROBLEM TO BE SOLVED: To provide a technique for comparing decision trees in detail without depending on a difference of tree structures thereof. SOLUTION: A data set storage section stores a plurality of data sets, which are sets of a plurality of instances respectively having the same kind of target attribute. A decision tree information storage section stores a plurality of decision trees respectively generated from different data sets. A target attribute determination section determines a value of a target attribute having many instances to be classified in the process of generating a decision tree for a node as a label of the node, for each node of the decision tree. A basic frequency calculation section calculates a frequency at which an instance having the same target attribute as a label of a node is classified in the process of generating a decision tree, for each node. An application frequency calculation section makes a decision tree classify an instance which has caused another decision tree to be generated, and calculates a frequency at which the instance having the same target attribute as a label of the node is classified, for each node of the decision tree. An output section outputs a result of comparing two frequencies as a comparison result of the decision trees. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a space data mining method for finding out a distance itself and an azimuth itself for optimizing a certain purpose to be requested by many analytical operations without previously determining a distance and an azimuth and deriving a space correlation rule. SOLUTION: The space data mining device for calculating an optimum distance from a data base including space information such as an address is provided with an input means for inputting an object function necessary for distance optimization, an intermediate table preparation part 30 for generating an intermediate table by calculating a distance between a start point and a question point on the basis of start point set data and question point set data stored in a data base, and an optimum distance calculation part 39 for calculating a distance for optimizing the value of the object function inputted by the input means on the basis of the intermediate table generated by the preparation part 30.
Abstract:
PROBLEM TO BE SOLVED: To make it possible to identify the source of the genetic information in a DNA having the specified information by embedding the information into the base sequence of a DNA. SOLUTION: This method comprises the steps of associating a base sequence having a pattern usually not appeared in DNA to the identification information for identifying the source of the specific genetic information which the DNA has, and embedding the base sequence associated to the identification information into the DNA so as not to affect the genetic information of the DNA.
Abstract:
PROBLEM TO BE SOLVED: To provide a method for efficiently solving the change analysis problem. SOLUTION: Different virtual labels, for example, like +1 and -1, are assigned to two data sets. A change analysis problem for the two data sets is reduced to a supervised learning problem by using the virtual labels. Specifically, a classifier such as logical regression, decision tree and SVM is prepared and is trained by use of a data set obtained by merging the two data sets assigned the virtual labels. A feature selection function of the resultant classifier is used to rank and output both every attribute contributing to classification and its contribution rate. COPYRIGHT: (C)2009,JPO&INPIT