-
公开(公告)号:CA2804514A1
公开(公告)日:2012-02-09
申请号:CA2804514
申请日:2011-07-11
Applicant: IBM
Inventor: UNNO YUYA , TSUBOI YUTA
Abstract: Provided is a technique whereby it is possible to suitably summarize and display within a limited range the peripheral context for the results of a search. For all contexts of a character string C = {c1,..., cn}, the surface area covered by a character string s is defined by the number of c that prefixes s and the area of the length of s. Further, for a set of all contexts, of the set of character strings with a maximum of K characters and a length of L or less, the set of character strings that maximizes the overall surface area covered is obtained under the condition that a partial character string belonging to another character string is not selected. According to the present invention this problem can be efficiently solved by the dynamic programming of a frequency ordered context tree created from a TRIE of all contexts. According to another finding of the present invention, when obtaining the maximum surface with dynamic programming, by estimating the upper limit of the surface area obtainable by a search, a significant number of items can be pruned from the search and thereby it is possible to speed up processing. Furthermore, by creating a frequency ordered suffix tree whereby child nodes of a suffix tree for a text are arranged by the frequency of appearance, it becomes possible to speed up both the search and the obtaining of the maximum surface area.