Invention Grant
US07711668B2 Online document clustering using TFIDF and predefined time windows
失效
使用TFIDF和预定义时间窗口的在线文档集群
- Patent Title: Online document clustering using TFIDF and predefined time windows
- Patent Title (中): 使用TFIDF和预定义时间窗口的在线文档集群
-
Application No.: US12072254Application Date: 2008-02-25
-
Publication No.: US07711668B2Publication Date: 2010-05-04
- Inventor: Klaus Brinker , Fabian Moerchen , Bernhard Glomann , Claus Neubauer
- Applicant: Klaus Brinker , Fabian Moerchen , Bernhard Glomann , Claus Neubauer
- Applicant Address: US NJ Iselin
- Assignee: Siemens Corporation
- Current Assignee: Siemens Corporation
- Current Assignee Address: US NJ Iselin
- Main IPC: G06N5/00
- IPC: G06N5/00

Abstract:
Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document and a relative age of each of the cluster centroids. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. A list of cluster identifiers in the cluster table may be maintained based on the relative age of cluster centroids corresponding to the cluster identifiers. Cluster identifiers that correspond to cluster centroids with a relative age exceeding a predetermined threshold are periodically removed from the list of cluster identifiers.
Public/Granted literature
- US20080205775A1 Online document clustering Public/Granted day:2008-08-28
Information query
IPC分类:
G | 物理 |
G06 | 计算;推算或计数 |
G06N | 基于特定计算模型的计算机系统 |
G06N5/00 | 利用基于知识的模式的计算机系统 |