Invention Grant
US08554561B2 Efficient indexing of documents with similar content 有权
具有类似内容的文件的高效索引

Efficient indexing of documents with similar content
Abstract:
A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
Public/Granted literature
Information query
Patent Agency Ranking
0/0