-
公开(公告)号:US08554543B2
公开(公告)日:2013-10-08
申请号:US13710055
申请日:2012-12-10
Applicant: Google Inc.
Inventor: Evgeny A. Cherepanov , Oleksandr Grushetskyy , Dmitry N. Orlov
IPC: G06F17/27
CPC classification number: G06F17/2872 , G06F17/2755
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status.
-
公开(公告)号:US09697259B1
公开(公告)日:2017-07-04
申请号:US13953162
申请日:2013-07-29
Applicant: Google Inc.
Inventor: Hyung-Jin Kim , Oleksandr Grushetskyy , Andrei Lopatenko
CPC classification number: G06F17/3053 , G06F17/30424 , G06F17/30867
Abstract: A computer-implemented method for processing query information includes receiving data representative of a search query from a user search session. The method also includes identifying a plurality of search results based upon the search query. Each search result is associated with a plurality of user characteristics and data that represents requestor behavior relative to previously submitted queries associated with the respective search result. The method also includes ordering the plurality of user characteristics based upon the data that represents requestor behavior relative to previously submitted queries and the respective search result. The method also includes adjusting the ordered plurality of user characteristics based upon at least one predefined compatibility associated with the user characteristics. The method also includes ranking the search results based upon the adjusted plurality of user characteristics.
-
公开(公告)号:US08762370B1
公开(公告)日:2014-06-24
申请号:US13763471
申请日:2013-02-08
Applicant: Google Inc.
Inventor: Oleksandr Grushetskyy , Steven D. Baker
CPC classification number: G06F17/30867 , G06F17/2795
Abstract: One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.
Abstract translation: 本发明的一个实施例提供了一种自动生成来自文档的单词的同义词的系统。 在操作期间,该系统确定文档中的单词对的同现频率。 该系统还确定文档中的单词对的接近度分数,其中一个接近度分数指示一对单词是否彼此靠近,以致该单词可能以相同的句子或短语发生。 最后,系统基于所确定的同现频率和所确定的接近度分数来确定词组是否是同义词。 在进行该确定的同时,系统还可以考虑文档中的标题或锚点之间的相关性以及文档中的单词以及文档中的单词对的单词分数。
-
-