Invention Grant
- Patent Title: Domain specific natural language normalization
- Patent Title (中): 域特定自然语言规范化
-
Application No.: US13414687Application Date: 2012-03-07
-
Publication No.: US09122673B2Publication Date: 2015-09-01
- Inventor: Shareef Alshinnawi , Gary D. Cudak , Edward S. Suffern , John M. Weber
- Applicant: Shareef Alshinnawi , Gary D. Cudak , Edward S. Suffern , John M. Weber
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: CRGO Law
- Agent Steven M. Greenberg, Esq.
- Main IPC: G06F17/27
- IPC: G06F17/27

Abstract:
Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text.
Public/Granted literature
- US20130238313A1 DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION Public/Granted day:2013-09-12
Information query