Invention Grant
- Patent Title: Phrase-based document clustering with automatic phrase extraction
- Patent Title (中): 基于短语的文本聚类与自动短语提取
-
Application No.: US12785105Application Date: 2010-05-21
-
Publication No.: US08392175B2Publication Date: 2013-03-05
- Inventor: Joy Thomas , Karthik Ramachandran
- Applicant: Joy Thomas , Karthik Ramachandran
- Applicant Address: US CA Mountain View
- Assignee: Stratify, Inc.
- Current Assignee: Stratify, Inc.
- Current Assignee Address: US CA Mountain View
- Main IPC: G06F17/20
- IPC: G06F17/20 ; G06F17/27 ; G06F17/21

Abstract:
Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
Public/Granted literature
- US20110191098A1 PHRASE-BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION Public/Granted day:2011-08-04
Information query