Invention Grant
US08208765B2 Search and retrieval of documents indexed by optical character recognition
有权
搜索和检索通过光学字符识别索引的文档
- Patent Title: Search and retrieval of documents indexed by optical character recognition
- Patent Title (中): 搜索和检索通过光学字符识别索引的文档
-
Application No.: US11972446Application Date: 2008-01-10
-
Publication No.: US08208765B2Publication Date: 2012-06-26
- Inventor: Bo Wu , Jianjun Dou , Ning Le , Yadong Wu , Jing Jia
- Applicant: Bo Wu , Jianjun Dou , Ning Le , Yadong Wu , Jing Jia
- Applicant Address: JP Osaka
- Assignee: Sharp Kabushiki Kaisha
- Current Assignee: Sharp Kabushiki Kaisha
- Current Assignee Address: JP Osaka
- Agency: Birch, Stewart, Kolasch & Birch, LLP
- Priority: CN200710129606 20070723
- Main IPC: G06K9/00
- IPC: G06K9/00

Abstract:
An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.
Public/Granted literature
Information query