Abstract:
PROBLEM TO BE SOLVED: To provide a selecting device, a selecting method, a program and a recording medium which solve the problem that it is difficult to appropriately select only desired data when there is a large amount of electronized data even though various data are electronized and used in accordance with the advancement and popularization of computers and can efficiently obtain a set related strongly to a predetermined reference element among a plurality of elements stored in a database or the like. SOLUTION: This selecting device 10 for selecting an adjacent element set whose strength of a relation to the predetermined reference element satisfies a predetermined condition among the plurality of elements is provided with an adjacent element set candidate selecting part 110 for selecting a set of elements whose relation to the reference element is determined to be stronger among the plurality of elements as each of a plurality of adjacent element set candidates being a plurality of candidates in which the adjacent element sets are different from one another, and an adjacent element set selecting part 130 for selecting a set of elements whose relational strength to the reference element satisfies the predetermined condition as an adjacent element set in a sum of sets of the plurality of adjacent element set candidates. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To select, as a cluster, member elements whose mutual association is strong from among a plurality of elements stored in a data base or the like. SOLUTION: An evaluating apparatus, which calculates a self-confidence value of cluster selection with respect to a cluster generated by selecting any of the plurality of elements, comprises: an evaluated object cluster selection part for selecting as a evaluated cluster an adjacent elements set to be a set of the reference numbers of elements whose association with a reference element is stronger; an adjacent element set selecting part for selecting the adjacent element set to be a set of the reference numbers of elements whose association with the member elements is stronger with respect to the respective member elements included in the cluster; and a confidence value calculating part for calculating the number of elements included in common in the adjacent element set of one member element of a pair and the adjacent element set of the other member element with respect to the pair of the two members selectable from the cluster and outputting the average ratio with respect to the combination of all the member elements as the self-confidence value. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide a system and a method based on the identification of a cluster of elements (e.g. documents) indicating a degree of high mutual similarity with a background for information retrieval and data mining of a large-scale text database. SOLUTION: The computer system comprises a neighborhood patch generation part 34 for generating data structure for the information retrieval of documents stored in the database and generating a node group having prescribed similarities in a hierarchy structure. The neighborhood patch generation part 34 comprises a hierarchy generation part 36 for generating hierarchy structure upon document-keyword vectors and a patch definition part 26. The computer system comprises a cluster estimation part 28 for generating cluster data of the document-keyword vectors by using the similarities of patches. COPYRIGHT: (C)2004,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide a computer-executable method for reducing dimensions, a program for making a computer execute the method, a dimension reduction device, and a search engine using the device. SOLUTION: The dimension reduction device which reduces the dimension of a numerical matrix using a computer to obtain a dimension-reduced matrix for providing information includes a processing part 32 which creates and stores a dimension-reduced matrix or index data for reducing dimensions, using a random average matrix RAV. The processing part 32 includes a shuffle vector creating part 44 for creating shuffle vectors for use as shuffle information, and a non-normal basic vector creating part 46 for creating a non-normal basic vector out of the numerical elements of data vectors designated by the shuffle vectors and for storing the non-normal basic vector. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide a method for retrieving information from a large scale database (containing several millions of data) in real time, while controlling trade-off in precision of a retrieved result and a response time thereof to a user. SOLUTION: This method is applicable, for example, to the database including a content of a document modeled in a clearly defined distance capable of calculating a distance between any two points, wherein the pair of documents near in its distance is more similar than a pair of documents apart each other. The method is combined with similarity ranking and/or another method to enhance scalability in information retrieval, detection, raking and tracking. COPYRIGHT: (C)2003,JPO