Invention Grant
- Patent Title: Method and apparatus for characterizing documents based on clusters of related words
- Patent Title (中): 基于相关单词集合来表征文档的方法和装置
-
Application No.: US12131637Application Date: 2008-06-02
-
Publication No.: US08688720B1Publication Date: 2014-04-01
- Inventor: Georges Harik , Noam M. Shazeer
- Applicant: Georges Harik , Noam M. Shazeer
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Fish & Richardson P.C.
- Main IPC: G06F7/00
- IPC: G06F7/00

Abstract:
One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects “candidate clusters” of conceptually related words that are related to the set of words. These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words. Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters. Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words.
Information query