Invention Grant
US08606795B2 Frequency based keyword extraction method and system using a statistical measure
失效
基于频率的关键词提取方法和系统采用统计学方法
- Patent Title: Frequency based keyword extraction method and system using a statistical measure
- Patent Title (中): 基于频率的关键词提取方法和系统采用统计学方法
-
Application No.: US12165962Application Date: 2008-07-01
-
Publication No.: US08606795B2Publication Date: 2013-12-10
- Inventor: Stephen C. Morgana , John C. Handley
- Applicant: Stephen C. Morgana , John C. Handley
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Ortiz & Lopez, PLLC
- Agent Kermit D. Lopez; Luis M. Ortiz
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
Frequency based keyword extraction method and system utilizing a statistical measure is disclosed which generates keywords within a page and/or document that can distinguish the document from an average document. A simple frequency threshold parameter can be utilized to determine a number of common stop words if a word in the document possesses a frequency in a corpus that is more than the threshold parameter. A statistical confidence interval of the frequency in the document can be compared against a frequency confidence interval of the word in the corpus. The extracted keyword possesses a greater intra-document frequency confidence interval than the frequency confidence interval of the word within the corpus. A statistical hypothesis test can also be utilized to determine the keyword by calculating a test statistic and testing whether the test statistic is greater than some threshold.
Public/Granted literature
- US20100005083A1 FREQUENCY BASED KEYWORD EXTRACTION METHOD AND SYSTEM USING A STATISTICAL MEASURE Public/Granted day:2010-01-07
Information query