Search index utilizing clusters of semantically similar phrases
Abstract:
The subject technology provides a search index that maps clusters of semantically similar phrases to documents that contain any one of the phrases of the respective cluster. The subject technology may identify the phrases from a set of documents, such as a document corpus, where each of the documents is associated with a document identifier. The subject technology may generate the clusters of semantically similar phrases from the identified phrases, where each of the generated clusters is assigned a cluster identifier. The subject technology generates an index that stores each respective cluster identifier of each respective cluster in association with each document identifier of each of the documents that includes at least one of the phrases contained in the respective cluster. Further, the subject technology stores the index in a memory such that the index may be subsequently utilized to identify documents that match a search query.
Public/Granted literature
Information query
Patent Agency Ranking
0/0