Invention Grant
- Patent Title: Document similarity detection
- Patent Title (中): 文件相似检测
-
Application No.: US10462690Application Date: 2003-06-17
-
Publication No.: US07734627B1Publication Date: 2010-06-08
- Inventor: Simon Tong
- Applicant: Simon Tong
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Harrity & Harrity, LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A similarity detector detects similar or near duplicate occurrences of a document. The similarity detector determines similarity of documents by characterizing the documents as clusters each made up of a set of term entries, such as pairs of terms. A pair of terms, for example, indicates that the first term of the pair occurs before the second term of the pair in the underlying document. Another document that has a threshold level of term entries in common with a cluster is considered similar to the document characterized by the cluster.
Information query