Invention Grant
- Patent Title: Classification of clustered documents based on similarity scores
- Patent Title (中): 基于相似度分数的聚类文档分类
-
Application No.: US13479188Application Date: 2012-05-23
-
Publication No.: US08543576B1Publication Date: 2013-09-24
- Inventor: Kirill Buryak , Jun Peng , Glenn M. Lewis , Nadav Benbarak , Aner Ben-Artzi
- Applicant: Kirill Buryak , Jun Peng , Glenn M. Lewis , Nadav Benbarak , Aner Ben-Artzi
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Remarck Law Group PLC
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Among other disclosed subject matter, a computer-implemented method that includes receiving a set of clusters of documents and calculating a similarity score for each cluster wherein the similarity score is based at least in part on features included in the documents in the cluster and indicates a measure of similarity of the documents in the cluster. For each cluster associated with a respective similarity score greater than a first threshold, identifying the cluster as satisfying a quality assurance requirement. For each cluster associated with a respective similarity score less than a second threshold, identifying the cluster as failing the quality assurance requirement. For each cluster associated with a similarity score less than or equal to the first threshold value and greater than or equal to the second threshold value, reviewing at least a subset of documents in the cluster to determine whether the cluster satisfies the quality assurance requirement.
Information query