Invention Grant
- Patent Title: Handwritten document categorizer and method of training
- Patent Title (中): 手写文件分类器和培训方法
-
Application No.: US12567920Application Date: 2009-09-28
-
Publication No.: US08566349B2Publication Date: 2013-10-22
- Inventor: Francois Ragnet , Florent C. Perronnin , Thierry Lehoux
- Applicant: Francois Ragnet , Florent C. Perronnin , Thierry Lehoux
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
Public/Granted literature
- US20110078191A1 HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING Public/Granted day:2011-03-31
Information query