Invention Grant
US07805288B2 Corpus expansion system and method thereof 有权
语料库扩展系统及其方法

Corpus expansion system and method thereof
Abstract:
A system and method for expanding new sample seeds to automatically expand corpora, in which sample seeds are used to collect corpus is provided. The new sample seeds are generated based on the already existed sample seeds and collected corpora; The corpus expansion strategy is determined based on all the sample seeds having been used and new sample seeds: The new sample seeds are refined based on the corpus expansion strategy, and the refined new sample seeds are used to further collect corpus. The above steps are repeatedly executed until predefined condition is satisfied. According to the invention, corpus may be automatically expanded from the web or other resources with low cost and in convenient way to improve the coverage of corpora.
Public/Granted literature
Information query
Patent Agency Ranking
0/0