Invention Grant
- Patent Title: Cluster labeling system for documents comprising unstructured text data
-
Application No.: US14501431Application Date: 2014-09-30
-
Publication No.: US09672279B1Publication Date: 2017-06-06
- Inventor: Raphael Cohen , Alon Grubshtein , Aisling J. Crowley , Peter R. Elliot
- Applicant: EMC Corporation
- Applicant Address: US MA Hopkinton
- Assignee: EMC IP Holding Company LLC
- Current Assignee: EMC IP Holding Company LLC
- Current Assignee Address: US MA Hopkinton
- Agency: Ryan, Mason & Lewis, LLP
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/00 ; G06F17/30 ; G06N7/00

Abstract:
An apparatus comprises a processing platform configured to implement a cluster labeling system for documents comprising unstructured text data. The cluster labeling system comprises a clustering module and a visualization module. The clustering module implements a topic model generator and is configured to assign each of the documents to one or more of a plurality of clusters based at least in part on one or more topics identified from the unstructured text data using at least one topic model provided by the topic model generator. The visualization module comprises multiple view generators configured to generate respective distinct visualizations of a selected one of the clusters. The multiple view generators include at least a bigram view generator configured to provide a visualization of a plurality of term pairs from the selected cluster, and a summarization view generator configured to provide a visualization of representative term sequences from the selected cluster.
Information query