Invention Grant
- Patent Title: Efficient indexing of documents with similar content
- Patent Title (中): 具有类似内容的文件的高效索引
-
Application No.: US13571316Application Date: 2012-08-09
-
Publication No.: US08554561B2Publication Date: 2013-10-08
- Inventor: Jeffrey A. Dean , Sanjay Ghemawat , Gautham Thambidorai
- Applicant: Jeffrey A. Dean , Sanjay Ghemawat , Gautham Thambidorai
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Morgan, Lewis & Bockius LLP
- Main IPC: G10L15/06
- IPC: G10L15/06

Abstract:
A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
Public/Granted literature
- US20120303622A1 Efficient Indexing of Documents with Similar Content Public/Granted day:2012-11-29
Information query