Invention Grant
US09043926B2 Identifying primarily monosemous keywords to include in keyword lists for detection of domain-specific language 有权
识别主要是单一关键字,以包含在关键字列表中,以检测特定领域的语言

  • Patent Title: Identifying primarily monosemous keywords to include in keyword lists for detection of domain-specific language
  • Patent Title (中): 识别主要是单一关键字,以包含在关键字列表中,以检测特定领域的语言
  • Application No.: US13722682
    Application Date: 2012-12-20
  • Publication No.: US09043926B2
    Publication Date: 2015-05-26
  • Inventor: Michael Hart
  • Applicant: Symantec Corporation
  • Applicant Address: US CA Mountain View
  • Assignee: Symantec Corporation
  • Current Assignee: Symantec Corporation
  • Current Assignee Address: US CA Mountain View
  • Agency: Patterson & Sheridan LLP
  • Main IPC: G06F21/60
  • IPC: G06F21/60
Identifying primarily monosemous keywords to include in keyword lists for detection of domain-specific language
Abstract:
Techniques are described for generating a monosemous (i.e., single sense) keyword list associated with a particular domain (e.g., a medical or financial domain) for document classification. An input term frequency dictionary, a candidate keyword list, and a document corpus may be used to generate the keyword list. A collection of documents is divided into two sets, one related to a target domain and one not. A statistical approach may be used to evaluate each term in the candidate list to determine a measure of how monosemous each remaining candidate term is, i.e., how strongly the term (or short phrase) identifies with a single sense. Terms with a primarily single sense related to the target domain are added to the monosemous keyword list. The keyword list may be used to identify documents associated with the domain, allowing, the appropriate protections to be applied to the document (e.g., do not send outside an enterprise boundary or permit copying).
Information query
Patent Agency Ranking
0/0