Abstract:
PROBLEM TO BE SOLVED: To reduce the burden of an operator by displaying character images similar in letter shape whose typefaces are similar collectively in a confirmation screen where the character images in the same category are arranged, thereby improving the operation efficiency of confirming/correcting recognition results by the operator. SOLUTION: This output mechanism of a character recognition device is provided with: a category classifying part 20 for classifying the image data of characters being the target of the character recognition processing for every character(category) recognized by the character recognition processing; a clustering processing part 30 for calculating featured values related with the shapes of the characters included in the image data in each category classified by the category classifying part 20, and for classifying the image data into one or more clusters based on the featured values; and a picture generating part 50 for generating and displaying the confirmation picture on which the image data are displayed for every cluster classified by the clustering processing part 30. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide technology which can stably discriminate a document by generating a stable virtual page mark even for a non-OCR document. SOLUTION: To detect the virtual page mark, a key segment which can stably be detected in the document is previously defined and on the basis of the key segment, the virtual page mark (circumscribed rectangle) is generated. Redundancy is given to the detection of the key segment and even if the key segment can not be detected because of a stain, absence, etc., an alternative segment is defined to generate the virtual page mark on the basis of the alternative segment. The selected key segment meets conditions; (1) a segment which is thick enough to be tolerant of faintness of the document, (2) a segment which is at a sufficient distance from the circumference of the document and stable against skew and the absence of a document end, and (3) a segment which does not overlap with a fold of the original paper of the document. COPYRIGHT: (C)2003,JPO
Abstract:
PROBLEM TO BE SOLVED: To enable the specification of a character frame and character recognition of a business form without a page mark and a reference mark even with a scanner incapable of detecting an edge of the business form and to simultaneously increase the speed of a discrimination processing of a bit map image by comparing images based on a externally contacted rectangle, etc., formed from horizontal segments capable of being detected at high speed. SOLUTION: When the bit map image including the horizontal segments like character frames 323, 327 and ruled lines 301, 303 on the business form is discriminated as in the case that the business form 300 with the black character frame and without the page mark is discriminated and the character is recognized by an OCR, the horizontal segments are extracted as characteristics of the business form, the circumscribed rectangle 350 is formed in an area to be generated from the horizontal segments and the circumscribed rectangle 350 is defined as information to discriminate an estimation standard of a position of the character frame and a kind of the business form. Even the business form without the page mark and the reference mark is recognized by adapting the information to the OCR. In addition, the business form is more exactly discriminated by comparing the extracted horizontal segments themselves with the horizontal segment of a preliminarily registered business form definition body and comparing similarity.