Invention Grant
- Patent Title: Category-based lemmatizing of a phrase in a document
-
Application No.: US14820601Application Date: 2015-08-07
-
Publication No.: US09672278B2Publication Date: 2017-06-06
- Inventor: James E. Bostick , John M. Ganci, Jr. , John P. Kaemmerer , Craig M. Trim
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Law Office of Jim Boice
- Agent John R. Pivnichny
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F17/27

Abstract:
A processor receives a string of binary data that represents an initial phrase that includes multiple words and is associated with a specific category. The processor removes one or more letters from an end of a word in the initial phrase to form an initial truncated version of the phrase. The processor runs a TF-IDF algorithm on the initial truncated version of the phrase, and lemmatizes subsequent truncated versions of the initial phrase by recursively removing remaining letters from the end of the word. The processor runs the TF-IDF algorithm on subsequent truncated versions of the initial truncated version of the initial phrase until a highest TF-IDF value is identified. The processor defines a breadth of a lemma for a lexeme based on the specific category of the phrase, and assigns the specific truncated version having the highest TF-IDF value to the specific category.
Public/Granted literature
- US20150347575A1 CATEGORY-BASED LEMMATIZING OF A PHRASE IN A DOCUMENT Public/Granted day:2015-12-03
Information query