-
公开(公告)号:US3398870A
公开(公告)日:1968-08-27
申请号:US61114967
申请日:1967-01-23
Applicant: IBM
Inventor: MULLAN PHILIP J , SANDFORD PLATTER
CPC classification number: F16C33/107 , F16C17/026 , F16C2370/12 , G11B15/64 , G11B17/32
-
公开(公告)号:FR2329023A1
公开(公告)日:1977-05-20
申请号:FR7629487
申请日:1976-09-22
-
公开(公告)号:CA1066418A
公开(公告)日:1979-11-13
申请号:CA268693
申请日:1976-12-23
Applicant: IBM
Inventor: HILLIARD JOHN J , MULLAN PHILIP J , ROSENBAUM WALTER S
Abstract: ALPHABETIC CHARACTER WORD UPPER/LOWER CASE PRINT CONVENTION APPARATUS AND METHOD The apparatus and method disclosed herein determines whether an alphabetic character field output from an optical character reader(OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. Each word output by the OCR corresponds to a field (i.e., word) of characters on the scanned document. The signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus having precomputed empirical probabilities therein that: a given character recognition is the result of the scan of an upper or lower case character as the case may be. The storage apparatus includes probability values for character conflicts and rejects. As the series of signals from the OCR output are applied to the character occurrence probability storage apparatus, a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination for further text processing.
-
公开(公告)号:CA833417A
公开(公告)日:1970-02-03
申请号:CA833417D
Applicant: IBM
Inventor: MULLAN PHILIP J , PLATTER SANDFORD
-
公开(公告)号:CA1061000A
公开(公告)日:1979-08-21
申请号:CA264592
申请日:1976-10-25
Applicant: IBM
Inventor: MULLAN PHILIP J , ROSENBAUM WALTER S
Abstract: MULTI-CHANNEL RECOGNITION DISCRIMINATOR A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character Ei, the probability that numeric NK and Cyrillic CJ characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recognized a Cyrillic character CJ the probability that numeric NK and English Ei characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character NK, the probability that English Ei and Cyrillic CJ characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.
-
公开(公告)号:FR2336743A1
公开(公告)日:1977-07-22
申请号:FR7636143
申请日:1976-11-24
Applicant: IBM
Inventor: HILLIARD JOHN J , MULLAN PHILIP J , ROSENBAUM WALTER S
Abstract: The print convention apparatus and method disclosed herein effects a decision making process with respect to a determination as to whether an alphabetic character field output from an optical character reader (OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. The alphabetic character field (e.g., a word) is comprised of one or a series of alphabetic characters which represent the OCR's interpretation of characters printed on the scanned document. Each word output by the OCR corresponds to a field (i.e., word) of characters imprinted on the scanned document. The electrical signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus which contains precomputed empirical probabilities therein that: (1) a given character recognition is the result of the scan of an upper case character; and (2) a given character recognition is the result of the scan of a lower case character. In addition, the storage apparatus includes probability values for character conflicts and rejects. As the series of alphabetic character signals from the OCR output are applied character-by-character to the character occurrence probability storage apparatus (e.g., a read-only store), a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination, and associated with the alphabetic character word output from the print convention apparatus for further text processing. In one embodiment of the invention the probability for each OCR output alphabetic character being an upper or lower case character is stored in respective upper and lower case character occurrence probability storage devices after having been precomputed as the product of two probability factors; i.e., (1) a first probability factor with respect to the likelihood that the OCR recognition resulted from the scan of an upper or lower case character, and (2) a second probability factor with respect to the likelihood of a given character occurring in a specified language (e.g., English) document. In another embodiment of the invention, the character occurrence probability storage devices are functionally replaced by a read-only store having an address position for each upper and lower case alphabetic character outputted by the OCR including conflicts and rejects, and a precomputed numerical probability value associated with each address position to represent the quotient of: (1) the probability that a given character is related to an upper case print convention; and (2) the probability that the same character is related to a lower case print convention.
-
-
-
-
-