Invention Grant
US08606795B2 Frequency based keyword extraction method and system using a statistical measure 失效
基于频率的关键词提取方法和系统采用统计学方法

Frequency based keyword extraction method and system using a statistical measure
Abstract:
Frequency based keyword extraction method and system utilizing a statistical measure is disclosed which generates keywords within a page and/or document that can distinguish the document from an average document. A simple frequency threshold parameter can be utilized to determine a number of common stop words if a word in the document possesses a frequency in a corpus that is more than the threshold parameter. A statistical confidence interval of the frequency in the document can be compared against a frequency confidence interval of the word in the corpus. The extracted keyword possesses a greater intra-document frequency confidence interval than the frequency confidence interval of the word within the corpus. A statistical hypothesis test can also be utilized to determine the keyword by calculating a test statistic and testing whether the test statistic is greater than some threshold.
Information query
Patent Agency Ranking
0/0