Invention Grant
- Patent Title: Method of feature extraction from noisy documents
- Patent Title (中): 噪声文件特征提取的方法
-
Application No.: US12336872Application Date: 2008-12-17
-
Publication No.: US08655803B2Publication Date: 2014-02-18
- Inventor: Loic Lecerf , Boris Chidlovskii
- Applicant: Loic Lecerf , Boris Chidlovskii
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F15/18
- IPC: G06F15/18

Abstract:
Aspect of the exemplary embodiment relate to a method and apparatus for automatically identifying features that are suitable for use by a classifier in assigning class labels to text sequences extracted from noisy documents. The exemplary method includes receiving a dataset of text sequences, automatically identifying a set of patterns in the text sequences, and filtering the patterns to generate a set of features. The filtering includes at least one of filtering out redundant patterns and filtering out irrelevant patterns. The method further includes outputting at least some of the features in the set of features, optionally after fusing features which are determined not to affect the classifiers accuracy if they are merged.
Public/Granted literature
- US20100150448A1 METHOD OF FEATURE EXTRACTION FROM NOISY DOCUMENTS Public/Granted day:2010-06-17
Information query