Invention Grant
US09588966B2 Data sorting for language processing such as POS tagging 有权
用于语言处理的数据排序,如POS标记

Data sorting for language processing such as POS tagging
Abstract:
Technology is disclosed that improves language coverage by selecting sentences to be used as training data for a language processing engine. The technology accomplishes the selection of a number of sentences by obtaining a group of sentences, computing a score for each sentence, sorting the sentences based on their scores, and selecting a number of sentences with the highest scores. The scores can be computed by dividing a sum of frequency values of unseen words (or n-grams) in the sentence by a length of the sentence. The frequency values can be based on posts in one or more particular domains, such as the public domain, the private domain, or other specialized domains.
Public/Granted literature
Information query
Patent Agency Ranking
0/0