Abstract:
PROBLEM TO BE SOLVED: To improve cluster quality when data substantially proceeds with a lapse of time. SOLUTION: In regard to a technique performing data clustering of a data stream, first, online statistics are generated from a data stream. Thereafter, offline processing of the online statistics is performed when the offline processing is needed or desired. The online statistics can be generated through the reception of data points from the data stream, and the formation and the update of a data group. The offline processing can be performed by re-clustering the data point group in the periphery of the sampled data points, and a newly formed cluster is reported. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated, 410, for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed, 420, by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.
Abstract:
In one aspect of the invention, a method of performing a conceptual similari ty search comprises the steps of generating one or more conceptual word-chains from on e or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or mo re matching topical words of the one or more documents.
Abstract:
At least a portion of a decision tree structure is generated from one or more multidimensional data objects by representing data associated with one or more of the data objects as a node, determining a condition for dividing the data at the node into at least two subsequent nodes based on a discriminant measure which maximises the separation between classes associated with the data, and dividing the data according to the condition. The multidimensional objects may be data records including feature variables and class variables and the method comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the data is achieved. The discriminant measure is preferably determined in accordance with Fisher's discriminant technique and the data is divided at a split plane determined to be perpendicular to a direction determined according to said technique and where an entropy measure is substantially optimised, as determined in accordance with a gini index.
Abstract:
In one aspect of the invention, a method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.
Abstract:
A method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.
Abstract:
In one aspect of the invention, a method of performing a conceptual similari ty search comprises the steps of generating one or more conceptual word-chains from on e or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or mo re matching topical words of the one or more documents.