Invention Grant
US08503797B2 Automatic document classification using lexical and physical features
有权
使用词汇和物理特征进行自动文档分类
- Patent Title: Automatic document classification using lexical and physical features
- Patent Title (中): 使用词汇和物理特征进行自动文档分类
-
Application No.: US12205613Application Date: 2008-09-05
-
Publication No.: US08503797B2Publication Date: 2013-08-06
- Inventor: Adam Turkelson , Huanfeng Ma
- Applicant: Adam Turkelson , Huanfeng Ma
- Applicant Address: US PA Philadelphia
- Assignee: The Neat Company, Inc.
- Current Assignee: The Neat Company, Inc.
- Current Assignee Address: US PA Philadelphia
- Agency: Woodcock Washburn LLP
- Main IPC: G06K15/00
- IPC: G06K15/00

Abstract:
An automatic document classification system is described that uses lexical and physical features to assign a class ciεC{c1, c2, . . . , ci} to a document d. The primary lexical features are the result of a feature selection method known as Orthogonal Centroid Feature Selection (OCFS). Additional information may be gathered on character type frequencies (digits, letters, and symbols) within d. Physical information is assembled through image analysis to yield physical attributes such as document dimensionality, text alignment, and color distribution. The resulting lexical and physical information is combined into an input vector X and is used to train a supervised neural network to perform the classification.
Public/Granted literature
- US20090067729A1 AUTOMATIC DOCUMENT CLASSIFICATION USING LEXICAL AND PHYSICAL FEATURES Public/Granted day:2009-03-12
Information query