Invention Grant
US07941418B2 Dynamic corpus generation 有权
动态语料库生成

Dynamic corpus generation
Abstract:
A computer-implemented method of generating a dynamic corpus includes generating web threads, based upon corresponding sets of words dequeued from a word queue, to obtain web thread resulting URLs. The web thread resulting URLs are enqueued in a URL queue. Multiple text extraction threads are generated, based upon documents downloaded using URLs dequeued from the URL queue, to obtain text files. New words are randomly obtained from the text files, and the randomly obtained words from the text files are enqueued in the word queue. This process is iteratively performed, resulting in a dynamic corpus.
Public/Granted literature
Information query
Patent Agency Ranking
0/0