Invention Grant
US07711668B2 Online document clustering using TFIDF and predefined time windows 失效
使用TFIDF和预定义时间窗口的在线文档集群

Online document clustering using TFIDF and predefined time windows
Abstract:
Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document and a relative age of each of the cluster centroids. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. A list of cluster identifiers in the cluster table may be maintained based on the relative age of cluster centroids corresponding to the cluster identifiers. Cluster identifiers that correspond to cluster centroids with a relative age exceeding a predetermined threshold are periodically removed from the list of cluster identifiers.
Public/Granted literature
Information query
Patent Agency Ranking
0/0