Invention Grant
US08224642B2 Automated identification of documents as not belonging to any language 有权
自动识别不属于任何语言的文件

  • Patent Title: Automated identification of documents as not belonging to any language
  • Patent Title (中): 自动识别不属于任何语言的文件
  • Application No.: US12275027
    Application Date: 2008-11-20
  • Publication No.: US08224642B2
    Publication Date: 2012-07-17
  • Inventor: Sauraj Goswami
  • Applicant: Sauraj Goswami
  • Applicant Address: US CA Mountain View
  • Assignee: Stratify, Inc.
  • Current Assignee: Stratify, Inc.
  • Current Assignee Address: US CA Mountain View
  • Main IPC: G06F17/20
  • IPC: G06F17/20
Automated identification of documents as not belonging to any language
Abstract:
An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
Information query
Patent Agency Ranking
0/0