Invention Grant
US07711737B2 Multi-document keyphrase extraction using partial mutual information
失效
使用部分相互信息的多文档关键短语提取
- Patent Title: Multi-document keyphrase extraction using partial mutual information
- Patent Title (中): 使用部分相互信息的多文档关键短语提取
-
Application No.: US11224195Application Date: 2005-09-12
-
Publication No.: US07711737B2Publication Date: 2010-05-04
- Inventor: Arungunram C. Surendran
- Applicant: Arungunram C. Surendran
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Lee & Hayes, PLLC
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A keyphrase extraction system and method are provided. The system and method can be employed to create an automatic summary of a subset of document(s). The system can automatically extract a list of keyword(s) that can operate on multiple documents, and across many different domains. The system is unsupervised and requires no prior learning.A term identifier identifies candidate terms (e.g., words and/or phrases) in the document subset which are used to form a document-term matrix. A probability computation component calculates probability values of: (1) the joint probability of a word (e.g., term) and a document, (2) the marginal probability of the word (e.g., term), and (3) the marginal probability of the document. Based on the probability values, a partial mutual information metric can be calculated for each candidate term. Based on the partial mutual information metric, one or more of the terms can be identified as summary keyphrases.
Public/Granted literature
- US20070061320A1 Multi-document keyphrase exctraction using partial mutual information Public/Granted day:2007-03-15
Information query