Invention Grant
- Patent Title: Document similarity detection
- Patent Title (中): 文件相似检测
-
Application No.: US12764293Application Date: 2010-04-21
-
Publication No.: US08209339B1Publication Date: 2012-06-26
- Inventor: Simon Tong
- Applicant: Simon Tong
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Harrity & Harrity, LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A similarity detector detects similar or near duplicate occurrences of a document. The similarity detector determines similarity of documents by characterizing the documents as clusters each made up of a set of term entries, such as pairs of terms. A pair of terms, for example, indicates that the first term of the pair occurs before the second term of the pair in the underlying document. Another document that has a threshold level of term entries in common with a cluster is considered similar to the document characterized by the cluster.
Information query