Invention Grant
US07836054B2 System and method for processing a message store for near duplicate messages 有权
用于处理近乎重复的消息的消息存储的系统和方法

System and method for processing a message store for near duplicate messages
Abstract:
A system and method for processing a message store for near duplicate messages is provided. Metadata, content, and each attachment associated with messages are extracted. Near duplicate messages in the message store are identified. Compound digests taken of the metadata for, of the content contained in, and of the each attachment associated with each of the messages in the message store are compared. Each message having a compound digest not matching the compound digest of any other message is marked as unique and each message having a compound digest matching the compound digest of at least one other message is marked as an exact duplicate. Messages remaining unmarked and having similar content are grouped into sets that each includes one or more near duplicate messages. One of the near duplicate messages is designated as unique and each remaining near duplicate message in the set is designated as a near duplicate.
Information query
Patent Agency Ranking
0/0