-
公开(公告)号:CA1048155A
公开(公告)日:1979-02-06
申请号:CA223701
申请日:1975-04-02
Applicant: IBM
Inventor: CHAIRES ANNE M , CICONTE JEAN M , ETT ALLEN H , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: BINARY REFERENCE MATRIX FOR A CHARACTER RECOGNITION MACHINE A binary reference matrix apparatus is disclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.
-
2.
公开(公告)号:CA1062811A
公开(公告)日:1979-09-18
申请号:CA230886
申请日:1975-07-07
Applicant: IBM
Inventor: BOLLINGER ELLEN W , CHAIRES ANNE M , CICONTE JEAN M , ETT ALLEN H , HILLIARD JOHN J , KOCHER DONALD F , ROSENBAUM WALTER S
Abstract: A CLUSTER STORE APPARATUS FOR POST PROCESSING ERROR CORRECTION OF A CHARACTER RECOGNITION MACHINE A cluster storage apparatus is disclosed for outputting groups of valid alpha words as potential candidates for the correct form of an alpha word misrecognized by a character recognition machine. Groups of alpha words are arranged in the cluster storage apparatus such that adjacent locations contain alpha words having similar character recognition misread propensities. Alpha words which have been determined to be misrecognized, are input to the cluster storage apparatus. Numerical values assigned to the characters of which the input word is composed, are used to calculate the address of that group of valid alpha words having similar character recognition misread propensities. The cluster storage apparatus then outputs the accessed groups of alpha words for subsequent processing. The organization of the cluster storage apparatus minimizes the difference in address between alpha words with similar character recognition misread propensities by assigning high numeric values to highly reliable characters, as determined by measuring the character transfer function of the character recognition machine.
-
公开(公告)号:CA1062810A
公开(公告)日:1979-09-18
申请号:CA221755
申请日:1975-03-10
Applicant: IBM
Inventor: BOLLINGER ELLEN W , CHAIRES ANNE M , CICONTE JEAN M , ETT ALLEN H , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propensity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conditional probability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word This regional context probability is compared with the probability of a simple substitution error for the characters. If the probability of segmentation is larger, the OCR characters in the first shift register are shifted one space with respect to the dictionary word characters in the second shift register so that subsequent character pairs to be compared are properly matched. The greater calculated probability is combined in a running product. The dictionary word with the largest running product is output by the system as the most likely correct form of the garbled OCR input word. In addition to optical character recognition, the system disclosed may be applied to correcting segmentation errors in ph?n?m?-characters output from a speech analyzer.
-
公开(公告)号:CA1050167A
公开(公告)日:1979-03-06
申请号:CA209648
申请日:1974-09-19
Applicant: IBM
Inventor: CHAIRES ANNE M , CICONTE JEAN M , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: An online numeric discriminator is disclosed which performs the decision making process between strings of characters coming from a dual output optical character recognition system for use in text processing or mail processing applications. The dual output OCR uses separate recognition processes for alphabetic and numeric characters and attempts to recognize each character independently as both an alphabetic and a numeric character. The alphabetic interpretation of the scanned word is outputted as an alphabetic subfield on a first output line and the numeric interpretation of the scanned word is outputted as a numeric subfield on a second output line from the OCR. The bayesian online numeric discriminator then analyzes the two character streams by calculating a first conditional probability that the OCR perceived the alphabetic subfield given that a numeric subfield was actually scanned and a second conditional probability that the OCR perceived the numeric subfield given that an alphabetic subfield was actually scanned. These first and second conditional probabilities are then compared. If the conditional probability that the OCR read the alphabetic subfield given that the numeric subfield was actually scanned, is larger than the conditional probability that the OCR read the numeric subfield given that the alphabetic subfield was actually scanned, then the numeric subfield is selected by the discriminator as the most probable interpretation of the word scanned by the OCR.
-
-
-