Invention Grant
- Patent Title: Language segmentation of multilingual texts
- Patent Title (中): 多语言文本语言分割
-
Application No.: US14073036Application Date: 2013-11-06
-
Publication No.: US09400787B2Publication Date: 2016-07-26
- Inventor: Anthony Aue
- Applicant: Anthony Aue
- Applicant Address: US WA Redmond
- Assignee: Microsoft Technology Licensing, LLC
- Current Assignee: Microsoft Technology Licensing, LLC
- Current Assignee Address: US WA Redmond
- Agent Steve Wight; Sandy Swain; Micky Minhas
- Main IPC: G06F17/28
- IPC: G06F17/28 ; G06F17/27

Abstract:
The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.
Public/Granted literature
- US20140067365A1 LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS Public/Granted day:2014-03-06
Information query