Invention Grant
US07937263B2 System and method for tokenization of text using classifier models
有权
使用分类器模型对文本进行标记化的系统和方法
- Patent Title: System and method for tokenization of text using classifier models
- Patent Title (中): 使用分类器模型对文本进行标记化的系统和方法
-
Application No.: US11001654Application Date: 2004-12-01
-
Publication No.: US07937263B2Publication Date: 2011-05-03
- Inventor: Jill Carrier , Alwin B. Carus , William F. Cote , John Dowd , Kathryn Del La Femina , Alan Frankel , Wensheng(Vincent) Han , Larissa Lapshina , Bernardo Rechea , Ana Santisteban , Amy J. Uhrbach
- Applicant: Jill Carrier , Alwin B. Carus , William F. Cote , John Dowd , Kathryn Del La Femina , Alan Frankel , Wensheng(Vincent) Han , Larissa Lapshina , Bernardo Rechea , Ana Santisteban , Amy J. Uhrbach
- Applicant Address: US CT Stratford
- Assignee: Dictaphone Corporation
- Current Assignee: Dictaphone Corporation
- Current Assignee Address: US CT Stratford
- Agency: Wolf, Greenfield & Sacks, P.C.
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/20

Abstract:
The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.
Public/Granted literature
- US20060116862A1 System and method for tokenization of text Public/Granted day:2006-06-01
Information query