Invention Grant
US08503797B2 Automatic document classification using lexical and physical features 有权
使用词汇和物理特征进行自动文档分类

Automatic document classification using lexical and physical features
Abstract:
An automatic document classification system is described that uses lexical and physical features to assign a class ciεC{c1, c2, . . . , ci} to a document d. The primary lexical features are the result of a feature selection method known as Orthogonal Centroid Feature Selection (OCFS). Additional information may be gathered on character type frequencies (digits, letters, and symbols) within d. Physical information is assembled through image analysis to yield physical attributes such as document dimensionality, text alignment, and color distribution. The resulting lexical and physical information is combined into an input vector X and is used to train a supervised neural network to perform the classification.
Information query
Patent Agency Ranking
0/0