-
1.
公开(公告)号:US3925761A
公开(公告)日:1975-12-09
申请号:US49425174
申请日:1974-08-02
Applicant: IBM
Inventor: CHAIRES ANNE MARIE , CICONTE JEAN MARIE , ETT ALLEN HAROLD , HILLIARD JOHN JOSEPH , ROSENBAUM WALTER STEVEN
CPC classification number: G06K9/72 , G06F17/28 , G06K2209/01 , G10L15/00
Abstract: A binary reference matrix apparatus is diclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.
Abstract translation: 二进制参考矩阵装置被用于验证来自字符识别机器的输入字词作为有效的语言表达。 二进制参考矩阵的组织基于字符识别机的字符传递函数。 由字符识别机扫描的每个字的字母字符流通过为字母表中的每个字母分配唯一的数值而映射到向量表示。 如此计算的矢量幅度和角度构成用于访问二进制参考矩阵的地址数据。 如果扫描字有效,矩阵中访问的点将具有1的二进制值,如果扫描字无效,则其二进制值为0。 二进制参考矩阵的组织通过字符识别机器中字符读取可靠性的倒数与字母字符反比例选择数字值来最小化精确验证所需的数组的大小,由字符识别的经验测量确定 机器,字符传递功能。
-
公开(公告)号:DE3175443D1
公开(公告)日:1986-11-13
申请号:DE3175443
申请日:1981-02-05
Applicant: IBM
Inventor: CONVIS DANNY BRADLEY , GLICKMAN DAVID , ROSENBAUM WALTER STEVEN
Abstract: Method and system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The method consists in comparing steps (20-24) a bit mask whose bits are set to reflect the presence or absence of specific characters or character combinations without regard to position in the misspelled word and in each of the dictionary candidate words. Then, (steps 25-27) a candidate word is dismissed from additional processing if there is not a predetermined percentage of bit mask match between the masks of the misspelled word and the candidate word.
-
公开(公告)号:DE2460757A1
公开(公告)日:1975-10-23
申请号:DE2460757
申请日:1974-12-21
Applicant: IBM
-
公开(公告)号:CH634781A5
公开(公告)日:1983-02-28
申请号:CH672078
申请日:1978-06-20
Applicant: IBM
Inventor: KETTLER HOWARD GEORGE , KOLPEK ROBERT ADOLPH , ROSENBAUM WALTER STEVEN
-
公开(公告)号:AU2952877A
公开(公告)日:1979-04-26
申请号:AU2952877
申请日:1977-10-10
Applicant: IBM
Inventor: ROSENBAUM WALTER STEVEN , TANNER HOWARD CARL
Abstract: IMPROVED APPARATUS FOR AUTOMATICALLY FORMING HYPHENATED WORDS Improved hyphenation apparatus is combined with word verification apparatus to automatically provide hyphenation points for input words from a keyboard or other input device. The spelling of each word input to the system is verified by the digital reference matrix section of the apparatus by calculating a vector magnitude and angle for the word which is compared to the contents of a storage dictionary of words. Each cell of storage in the storage dictionary, in addition to containing a unique angle representation of the input word, contains a byte of data representing the valid hyphenation points for the input word. When an input word is verified to be correctly spelled, the hyphenation byte is read out of dictionary and used by the hyphenation section to reassemble the word in hyphenated form. The hyphenated word is then displayed to the operator for appropriate action.
-
公开(公告)号:DE2654815A1
公开(公告)日:1978-01-26
申请号:DE2654815
申请日:1976-12-03
Applicant: IBM
Inventor: HILLIARD JOHN JOSEPH , MULLAN PHILIP JOSEPH , ROSENBAUM WALTER STEVEN
Abstract: The print convention apparatus and method disclosed herein effects a decision making process with respect to a determination as to whether an alphabetic character field output from an optical character reader (OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. The alphabetic character field (e.g., a word) is comprised of one or a series of alphabetic characters which represent the OCR's interpretation of characters printed on the scanned document. Each word output by the OCR corresponds to a field (i.e., word) of characters imprinted on the scanned document. The electrical signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus which contains precomputed empirical probabilities therein that: (1) a given character recognition is the result of the scan of an upper case character; and (2) a given character recognition is the result of the scan of a lower case character. In addition, the storage apparatus includes probability values for character conflicts and rejects. As the series of alphabetic character signals from the OCR output are applied character-by-character to the character occurrence probability storage apparatus (e.g., a read-only store), a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination, and associated with the alphabetic character word output from the print convention apparatus for further text processing. In one embodiment of the invention the probability for each OCR output alphabetic character being an upper or lower case character is stored in respective upper and lower case character occurrence probability storage devices after having been precomputed as the product of two probability factors; i.e., (1) a first probability factor with respect to the likelihood that the OCR recognition resulted from the scan of an upper or lower case character, and (2) a second probability factor with respect to the likelihood of a given character occurring in a specified language (e.g., English) document. In another embodiment of the invention, the character occurrence probability storage devices are functionally replaced by a read-only store having an address position for each upper and lower case alphabetic character outputted by the OCR including conflicts and rejects, and a precomputed numerical probability value associated with each address position to represent the quotient of: (1) the probability that a given character is related to an upper case print convention; and (2) the probability that the same character is related to a lower case print convention.
-
公开(公告)号:DE2513566A1
公开(公告)日:1976-02-19
申请号:DE2513566
申请日:1975-03-27
Applicant: IBM
Inventor: CHAIRES ANNE MARIE , CICONTE JEAN MARIE , ETT ALLEN HAROLD , HILLIARD JOHN JOSEPH , ROSENBAUM WALTER STEVEN
Abstract: A binary reference matrix apparatus is diclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.
-
公开(公告)号:DE3377858D1
公开(公告)日:1988-09-29
申请号:DE3377858
申请日:1983-06-01
Applicant: IBM
Inventor: BRICKMAN NORMAN FREDERICK , ROSENBAUM WALTER STEVEN
Abstract: An improved system for compacting text data to be transmitted over communications lines and thereby reduce the data volume and transmission time. Transmitting and receiving text processing systems are provided with identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library.
-
公开(公告)号:CH604277A5
公开(公告)日:1978-08-31
申请号:CH1164176
申请日:1976-09-14
Applicant: IBM
Inventor: MULLAN PHILIP JOSEPH , ROSENBAUM WALTER STEVEN
Abstract: A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character Ei, the probability that numeric NK and Cyrillic CJ characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recogized a Cyrillic character CJ the probability that numeric NK and English Ei characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character NK, the probability that English Ei and Cyrillic CJ characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.
-
公开(公告)号:DE2755875A1
公开(公告)日:1978-06-29
申请号:DE2755875
申请日:1977-12-15
Applicant: IBM
Inventor: ROSENBAUM WALTER STEVEN , TANNER HOWARD CARL
Abstract: IMPROVED APPARATUS FOR AUTOMATICALLY FORMING HYPHENATED WORDS Improved hyphenation apparatus is combined with word verification apparatus to automatically provide hyphenation points for input words from a keyboard or other input device. The spelling of each word input to the system is verified by the digital reference matrix section of the apparatus by calculating a vector magnitude and angle for the word which is compared to the contents of a storage dictionary of words. Each cell of storage in the storage dictionary, in addition to containing a unique angle representation of the input word, contains a byte of data representing the valid hyphenation points for the input word. When an input word is verified to be correctly spelled, the hyphenation byte is read out of dictionary and used by the hyphenation section to reassemble the word in hyphenated form. The hyphenated word is then displayed to the operator for appropriate action.
-
-
-
-
-
-
-
-
-