Invention Grant
US08626767B2 Computer-implemented system and method for identifying near duplicate messages 有权
用于识别近重复消息的计算机实现的系统和方法

Computer-implemented system and method for identifying near duplicate messages
Abstract:
A computer-implemented system and method for identifying near duplicate messages is provided. Messages each including a content body are grouped by conversation thread. One or more of the messages also includes an attachment. The messages for each conversation thread are sorted in order of message length. At least one of the messages is selected from one of the threads and the body of the selected message is compared with the body of one such shorter message in that thread. A determination is made that the body of the shorter message is included in the body of the selected message. Hash codes of the attachments for the selected message and the shorter message are compared. The shorter message is marked as a near duplicate message of the selected message when the hash codes of the attachments match.
Information query
Patent Agency Ranking
0/0