Method and apparatus for clustering data stream in progress through online and offline components
    1.
    发明专利
    Method and apparatus for clustering data stream in progress through online and offline components 有权
    通过在线和离线组件进行数据流的分组的方法和装置

    公开(公告)号:JP2005100363A

    公开(公告)日:2005-04-14

    申请号:JP2004234267

    申请日:2004-08-11

    Abstract: PROBLEM TO BE SOLVED: To improve cluster quality when data substantially proceeds with a lapse of time.
    SOLUTION: In regard to a technique performing data clustering of a data stream, first, online statistics are generated from a data stream. Thereafter, offline processing of the online statistics is performed when the offline processing is needed or desired. The online statistics can be generated through the reception of data points from the data stream, and the formation and the update of a data group. The offline processing can be performed by re-clustering the data point group in the periphery of the sampled data points, and a newly formed cluster is reported.
    COPYRIGHT: (C)2005,JPO&NCIPI

    Abstract translation: 要解决的问题:当数据实际上随时间流逝时,提高集群质量。 解决方案:对于执行数据流的数据聚类的技术,首先,从数据流生成在线统计。 此后,当需要或期望离线处理时,执行在线统计的离线处理。 可以通过从数据流接收数据点,以及数据组的形成和更新来生成在线统计信息。 离线处理可以通过对采样数据点的外围的数据点组进行重新聚类来执行,并且报告新形成的簇。 版权所有(C)2005,JPO&NCIPI

    Providing product recommendations in an electronic commerce system

    公开(公告)号:GB2345559A

    公开(公告)日:2000-07-12

    申请号:GB9923225

    申请日:1999-10-04

    Applicant: IBM

    Abstract: The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated, 410, for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed, 420, by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.

    METHODS AND APPARATUS FOR SIMILARITY TEXT SEARCH BASED ON CONCEPTUAL INDEXING

    公开(公告)号:CA2329558C

    公开(公告)日:2006-09-19

    申请号:CA2329558

    申请日:2000-12-22

    Applicant: IBM

    Abstract: In one aspect of the invention, a method of performing a conceptual similari ty search comprises the steps of generating one or more conceptual word-chains from on e or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or mo re matching topical words of the one or more documents.

    Generating decision trees with discriminants and employing the same in data classification

    公开(公告)号:GB2369697A

    公开(公告)日:2002-06-05

    申请号:GB0109736

    申请日:2001-04-20

    Applicant: IBM

    Abstract: At least a portion of a decision tree structure is generated from one or more multidimensional data objects by representing data associated with one or more of the data objects as a node, determining a condition for dividing the data at the node into at least two subsequent nodes based on a discriminant measure which maximises the separation between classes associated with the data, and dividing the data according to the condition. The multidimensional objects may be data records including feature variables and class variables and the method comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the data is achieved. The discriminant measure is preferably determined in accordance with Fisher's discriminant technique and the data is divided at a split plane determined to be perpendicular to a direction determined according to said technique and where an entropy measure is substantially optimised, as determined in accordance with a gini index.

    Methods and apparatus for similarity text search based on conceptual indexing

    公开(公告)号:GB2365569B

    公开(公告)日:2004-04-07

    申请号:GB0100851

    申请日:2001-01-12

    Applicant: IBM

    Abstract: In one aspect of the invention, a method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.

    Similarity text search based on conceptual indexing

    公开(公告)号:GB2365569A

    公开(公告)日:2002-02-20

    申请号:GB0100851

    申请日:2001-01-12

    Applicant: IBM

    Abstract: A method of performing a conceptual similarity search comprises the steps of: generating one or more conceptual word-chains from one or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or more matching topical words of the one or more documents.

    METHODS AND APPARATUS FOR SIMILARITY TEXT SEARCH BASED ON CONCEPTUAL INDEXING

    公开(公告)号:CA2329558A1

    公开(公告)日:2001-07-28

    申请号:CA2329558

    申请日:2000-12-22

    Applicant: IBM

    Abstract: In one aspect of the invention, a method of performing a conceptual similari ty search comprises the steps of generating one or more conceptual word-chains from on e or more documents to be used in the conceptual similarity search; building a conceptual index of documents with the one or more word-chains; and evaluating a similarity query using the conceptual index. The evaluating step preferably returns one or more of the closest documents resulting from the search; one or more matching word-chains in the one or more documents; and one or mo re matching topical words of the one or more documents.

Patent Agency Ranking