Invention Grant
- Patent Title: Duplicate entry detection system and method
- Patent Title (中): 重复条目检测系统和方法
-
Application No.: US11754237Application Date: 2007-05-25
-
Publication No.: US08046372B1Publication Date: 2011-10-25
- Inventor: Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
- Applicant: Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
- Applicant Address: US NV Reno
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US NV Reno
- Agency: Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
- Agent Robert C. Kowert
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
Information query