-
公开(公告)号:CA2021664A1
公开(公告)日:1991-04-25
申请号:CA2021664
申请日:1990-07-20
Applicant: IBM
Inventor: ROSENBAUM WALTER S , HILLIARD JOHN J
IPC: B07C3/14 , B07C3/00 , B07C3/20 , G06K9/03 , G06K9/72 , B07C3/08 , G06F7/08 , G06K7/10 , G06K13/02
Abstract: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee.
-
公开(公告)号:CA1062810A
公开(公告)日:1979-09-18
申请号:CA221755
申请日:1975-03-10
Applicant: IBM
Inventor: BOLLINGER ELLEN W , CHAIRES ANNE M , CICONTE JEAN M , ETT ALLEN H , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propensity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conditional probability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word This regional context probability is compared with the probability of a simple substitution error for the characters. If the probability of segmentation is larger, the OCR characters in the first shift register are shifted one space with respect to the dictionary word characters in the second shift register so that subsequent character pairs to be compared are properly matched. The greater calculated probability is combined in a running product. The dictionary word with the largest running product is output by the system as the most likely correct form of the garbled OCR input word. In addition to optical character recognition, the system disclosed may be applied to correcting segmentation errors in ph?n?m?-characters output from a speech analyzer.
-
公开(公告)号:CA1050167A
公开(公告)日:1979-03-06
申请号:CA209648
申请日:1974-09-19
Applicant: IBM
Inventor: CHAIRES ANNE M , CICONTE JEAN M , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: An online numeric discriminator is disclosed which performs the decision making process between strings of characters coming from a dual output optical character recognition system for use in text processing or mail processing applications. The dual output OCR uses separate recognition processes for alphabetic and numeric characters and attempts to recognize each character independently as both an alphabetic and a numeric character. The alphabetic interpretation of the scanned word is outputted as an alphabetic subfield on a first output line and the numeric interpretation of the scanned word is outputted as a numeric subfield on a second output line from the OCR. The bayesian online numeric discriminator then analyzes the two character streams by calculating a first conditional probability that the OCR perceived the alphabetic subfield given that a numeric subfield was actually scanned and a second conditional probability that the OCR perceived the numeric subfield given that an alphabetic subfield was actually scanned. These first and second conditional probabilities are then compared. If the conditional probability that the OCR read the alphabetic subfield given that the numeric subfield was actually scanned, is larger than the conditional probability that the OCR read the numeric subfield given that the alphabetic subfield was actually scanned, then the numeric subfield is selected by the discriminator as the most probable interpretation of the word scanned by the OCR.
-
公开(公告)号:FR2336743A1
公开(公告)日:1977-07-22
申请号:FR7636143
申请日:1976-11-24
Applicant: IBM
Inventor: HILLIARD JOHN J , MULLAN PHILIP J , ROSENBAUM WALTER S
Abstract: The print convention apparatus and method disclosed herein effects a decision making process with respect to a determination as to whether an alphabetic character field output from an optical character reader (OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. The alphabetic character field (e.g., a word) is comprised of one or a series of alphabetic characters which represent the OCR's interpretation of characters printed on the scanned document. Each word output by the OCR corresponds to a field (i.e., word) of characters imprinted on the scanned document. The electrical signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus which contains precomputed empirical probabilities therein that: (1) a given character recognition is the result of the scan of an upper case character; and (2) a given character recognition is the result of the scan of a lower case character. In addition, the storage apparatus includes probability values for character conflicts and rejects. As the series of alphabetic character signals from the OCR output are applied character-by-character to the character occurrence probability storage apparatus (e.g., a read-only store), a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination, and associated with the alphabetic character word output from the print convention apparatus for further text processing. In one embodiment of the invention the probability for each OCR output alphabetic character being an upper or lower case character is stored in respective upper and lower case character occurrence probability storage devices after having been precomputed as the product of two probability factors; i.e., (1) a first probability factor with respect to the likelihood that the OCR recognition resulted from the scan of an upper or lower case character, and (2) a second probability factor with respect to the likelihood of a given character occurring in a specified language (e.g., English) document. In another embodiment of the invention, the character occurrence probability storage devices are functionally replaced by a read-only store having an address position for each upper and lower case alphabetic character outputted by the OCR including conflicts and rejects, and a precomputed numerical probability value associated with each address position to represent the quotient of: (1) the probability that a given character is related to an upper case print convention; and (2) the probability that the same character is related to a lower case print convention.
-
-
-