Invention Grant
US08577155B2 System and method for duplicate text recognition 有权
重复文本识别的系统和方法

System and method for duplicate text recognition
Abstract:
A system for duplicate text recognition includes a first means for dividing an electronic text into a plurality of phrase segments; a second means for converting each of the phrase segments into a unique and fixed-length bit string; a third means for storing a plurality of groups of the bit strings, each group of bit strings (string group) including a plurality of bit strings respectively corresponding to the phrase segments in a particular electronic text; and a fourth means for determining whether a predefined similarity between any two string groups in the third means reaches a first threshold, and for determining the two electronic texts corresponding to the two string groups are duplicate texts if the predefined similarity between the two string groups reaches the first threshold.
Public/Granted literature
Information query
Patent Agency Ranking
0/0