Invention Grant
- Patent Title: Data sorting for language processing such as POS tagging
- Patent Title (中): 用于语言处理的数据排序,如POS标记
-
Application No.: US14804802Application Date: 2015-07-21
-
Publication No.: US09588966B2Publication Date: 2017-03-07
- Inventor: Matthias Gerhard Eck
- Applicant: Facebook, Inc.
- Applicant Address: US CA Menlo Park
- Assignee: Facebook, Inc.
- Current Assignee: Facebook, Inc.
- Current Assignee Address: US CA Menlo Park
- Agency: Perkins Coie LLP
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/28 ; G06F17/20 ; G06N5/02

Abstract:
Technology is disclosed that improves language coverage by selecting sentences to be used as training data for a language processing engine. The technology accomplishes the selection of a number of sentences by obtaining a group of sentences, computing a score for each sentence, sorting the sentences based on their scores, and selecting a number of sentences with the highest scores. The scores can be computed by dividing a sum of frequency values of unseen words (or n-grams) in the sentence by a length of the sentence. The frequency values can be based on posts in one or more particular domains, such as the public domain, the private domain, or other specialized domains.
Public/Granted literature
- US20170024376A1 DATA SORTING FOR LANGUAGE PROCESSING SUCH AS POS TAGGING Public/Granted day:2017-01-26
Information query