Invention Grant
- Patent Title: Automated identification of documents as not belonging to any language
- Patent Title (中): 自动识别不属于任何语言的文件
-
Application No.: US12275027Application Date: 2008-11-20
-
Publication No.: US08224642B2Publication Date: 2012-07-17
- Inventor: Sauraj Goswami
- Applicant: Sauraj Goswami
- Applicant Address: US CA Mountain View
- Assignee: Stratify, Inc.
- Current Assignee: Stratify, Inc.
- Current Assignee Address: US CA Mountain View
- Main IPC: G06F17/20
- IPC: G06F17/20

Abstract:
An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
Public/Granted literature
- US20100125448A1 AUTOMATED IDENTIFICATION OF DOCUMENTS AS NOT BELONGING TO ANY LANGUAGE Public/Granted day:2010-05-20
Information query