Invention Grant
US08594239B2 Estimating document similarity using bit-strings 有权
使用位串来估计文档相似度

Estimating document similarity using bit-strings
Abstract:
Each of a plurality of documents is divided into samples. Small bit-strings are generated for selected samples from each of the documents and used to create a sketch for each document. Because the bit-strings are small (e.g., only one, two, or three bits in length), the generated sketches are smaller than the sketches generated using previous methods for generating sketches, and therefore use less storage space. The generated sketches are compared to determine documents that are near-duplicates of one another.
Public/Granted literature
Information query
Patent Agency Ranking
0/0