Invention Grant
- Patent Title: Representative document selection for a set of duplicate documents
- Patent Title (中): 代表文件选择一套重复的文件
-
Application No.: US13599707Application Date: 2012-08-30
-
Publication No.: US08868559B2Publication Date: 2014-10-21
- Inventor: Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean
- Applicant: Daniel Dulitz , Alexandre A. Verstak , Sanjay Ghemawat , Jeffrey A. Dean
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Morgan, Lewis & Bockius LLP
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.
Public/Granted literature
- US20120323896A1 REPRESENTATIVE DOCUMENT SELECTION FOR A SET OF DUPLICATE DOCUMENTS Public/Granted day:2012-12-20
Information query