Invention Grant
- Patent Title: Phrase matching for document classification
- Patent Title (中): 短语匹配文件分类
-
Application No.: US12075339Application Date: 2008-03-11
-
Publication No.: US08401842B1Publication Date: 2013-03-19
- Inventor: Ilan Ginzburg , Bruno Roustant
- Applicant: Ilan Ginzburg , Bruno Roustant
- Applicant Address: US MA Hopkinton
- Assignee: EMC Corporation
- Current Assignee: EMC Corporation
- Current Assignee Address: US MA Hopkinton
- Agent Barry N. Young
- Main IPC: G06F17/21
- IPC: G06F17/21

Abstract:
Phrase matching processes for matching phrases comprising a plurality of keywords in document text construct hit lists of the keywords in a document text, and operate on the keywords in either phrase order or without regard to the order of occurrence of the keywords in the phrase. The processes form sorted sets of all keywords, and compare occurrences of the keywords in the sorted sets to a predefined proximity constraint. For unordered phrases, the proximity constraint defines a maximum span between keywords in the highest and lowest positions in the sorted set as MaxSpan=p(k−1), where p is a proximity and k is the number of keywords in the phrase. For ordered phrases, the distances between successive phrase keywords in phrase order must be less than or equal to the proximity p.
Information query