Automatic topic discovery in streams of unstructured data
Abstract:
A method is provided for automatically discovering topics in electronic posts, such as social media posts. The method includes receiving a corpus that includes a plurality of electronic posts. The method further includes identifying a plurality of candidate terms within the corpus and selecting, as a trimmed lexicon, a subset of the plurality of candidate terms using predefined criteria. The method further includes clustering at least a subset of the plurality of electronic posts according to a plurality of clusters using the lexicon to produce a plurality of statistical topic models. The method further includes storing information corresponding to the statistical topic models.
Public/Granted literature
Information query
Patent Agency Ranking
0/0