Invention Grant
- Patent Title: Detecting query-specific duplicate documents
- Patent Title (中): 检测特定于查询的重复文档
-
Application No.: US12839164Application Date: 2010-07-19
-
Publication No.: US08214359B1Publication Date: 2012-07-03
- Inventor: Benedict A. Gomes , Benjamin T. Smith
- Applicant: Benedict A. Gomes , Benjamin T. Smith
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Fish & Richardson P.C.
- Main IPC: G06F7/00
- IPC: G06F7/00

Abstract:
An improved duplicate detection technique that uses query-relevant information to limit the portion(s) of documents to be compared for similarity is described. Before comparing two documents for similarity, the content of these documents may be condensed based on the query. In one embodiment, query-relevant information or text (also referred to as “snippets”) is extracted from the documents and only the extracted snippets, rather than the entire documents, are compared for purposes of determining similarity.
Information query