Binary reference matrix for a character recognition machine
    1.
    发明授权
    Binary reference matrix for a character recognition machine 失效
    用于字符识别机的二进制参考矩阵

    公开(公告)号:US3925761A

    公开(公告)日:1975-12-09

    申请号:US49425174

    申请日:1974-08-02

    Applicant: IBM

    CPC classification number: G06K9/72 G06F17/28 G06K2209/01 G10L15/00

    Abstract: A binary reference matrix apparatus is diclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.

    Abstract translation: 二进制参考矩阵装置被用于验证来自字符识别机器的输入字词作为有效的语言表达。 二进制参考矩阵的组织基于字符识别机的字符传递函数。 由字符识别机扫描的每个字的字母字符流通过为字母表中的每个字母分配唯一的数值而映射到向量表示。 如此计算的矢量幅度和角度构成用于访问二进制参考矩阵的地址数据。 如果扫描字有效,矩阵中访问的点将具有1的二进制值,如果扫描字无效,则其二进制值为0。 二进制参考矩阵的组织通过字符识别机器中字符读取可靠性的倒数与字母字符反比例选择数字值来最小化精确验证所需的数组的大小,由字符识别的经验测量确定 机器,字符传递功能。

    ALPHA CONTENT PRESCAN METHOD FOR AUTOMATIC SPELLING ERROR CORRECTION

    公开(公告)号:DE3175443D1

    公开(公告)日:1986-11-13

    申请号:DE3175443

    申请日:1981-02-05

    Applicant: IBM

    Abstract: Method and system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The method consists in comparing steps (20-24) a bit mask whose bits are set to reflect the presence or absence of specific characters or character combinations without regard to position in the misspelled word and in each of the dictionary candidate words. Then, (steps 25-27) a candidate word is dismissed from additional processing if there is not a predetermined percentage of bit mask match between the masks of the misspelled word and the candidate word.

    AUTOMATICALLY FORMING HYPHENATED WORDS

    公开(公告)号:AU2952877A

    公开(公告)日:1979-04-26

    申请号:AU2952877

    申请日:1977-10-10

    Applicant: IBM

    Abstract: IMPROVED APPARATUS FOR AUTOMATICALLY FORMING HYPHENATED WORDS Improved hyphenation apparatus is combined with word verification apparatus to automatically provide hyphenation points for input words from a keyboard or other input device. The spelling of each word input to the system is verified by the digital reference matrix section of the apparatus by calculating a vector magnitude and angle for the word which is compared to the contents of a storage dictionary of words. Each cell of storage in the storage dictionary, in addition to containing a unique angle representation of the input word, contains a byte of data representing the valid hyphenation points for the input word. When an input word is verified to be correctly spelled, the hyphenation byte is read out of dictionary and used by the hyphenation section to reassemble the word in hyphenated form. The hyphenated word is then displayed to the operator for appropriate action.

    6.
    发明专利
    未知

    公开(公告)号:DE2654815A1

    公开(公告)日:1978-01-26

    申请号:DE2654815

    申请日:1976-12-03

    Applicant: IBM

    Abstract: The print convention apparatus and method disclosed herein effects a decision making process with respect to a determination as to whether an alphabetic character field output from an optical character reader (OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. The alphabetic character field (e.g., a word) is comprised of one or a series of alphabetic characters which represent the OCR's interpretation of characters printed on the scanned document. Each word output by the OCR corresponds to a field (i.e., word) of characters imprinted on the scanned document. The electrical signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus which contains precomputed empirical probabilities therein that: (1) a given character recognition is the result of the scan of an upper case character; and (2) a given character recognition is the result of the scan of a lower case character. In addition, the storage apparatus includes probability values for character conflicts and rejects. As the series of alphabetic character signals from the OCR output are applied character-by-character to the character occurrence probability storage apparatus (e.g., a read-only store), a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination, and associated with the alphabetic character word output from the print convention apparatus for further text processing. In one embodiment of the invention the probability for each OCR output alphabetic character being an upper or lower case character is stored in respective upper and lower case character occurrence probability storage devices after having been precomputed as the product of two probability factors; i.e., (1) a first probability factor with respect to the likelihood that the OCR recognition resulted from the scan of an upper or lower case character, and (2) a second probability factor with respect to the likelihood of a given character occurring in a specified language (e.g., English) document. In another embodiment of the invention, the character occurrence probability storage devices are functionally replaced by a read-only store having an address position for each upper and lower case alphabetic character outputted by the OCR including conflicts and rejects, and a precomputed numerical probability value associated with each address position to represent the quotient of: (1) the probability that a given character is related to an upper case print convention; and (2) the probability that the same character is related to a lower case print convention.

    7.
    发明专利
    未知

    公开(公告)号:DE2513566A1

    公开(公告)日:1976-02-19

    申请号:DE2513566

    申请日:1975-03-27

    Applicant: IBM

    Abstract: A binary reference matrix apparatus is diclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.

    METHOD OF ENCODING AND TRANSMITTING DOCUMENTS FOR TEXT PROCESSING SYSTEMS

    公开(公告)号:DE3377858D1

    公开(公告)日:1988-09-29

    申请号:DE3377858

    申请日:1983-06-01

    Applicant: IBM

    Abstract: An improved system for compacting text data to be transmitted over communications lines and thereby reduce the data volume and transmission time. Transmitting and receiving text processing systems are provided with identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library.

    9.
    发明专利
    未知

    公开(公告)号:CH604277A5

    公开(公告)日:1978-08-31

    申请号:CH1164176

    申请日:1976-09-14

    Applicant: IBM

    Abstract: A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character Ei, the probability that numeric NK and Cyrillic CJ characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recogized a Cyrillic character CJ the probability that numeric NK and English Ei characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character NK, the probability that English Ei and Cyrillic CJ characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.

    10.
    发明专利
    未知

    公开(公告)号:DE2755875A1

    公开(公告)日:1978-06-29

    申请号:DE2755875

    申请日:1977-12-15

    Applicant: IBM

    Abstract: IMPROVED APPARATUS FOR AUTOMATICALLY FORMING HYPHENATED WORDS Improved hyphenation apparatus is combined with word verification apparatus to automatically provide hyphenation points for input words from a keyboard or other input device. The spelling of each word input to the system is verified by the digital reference matrix section of the apparatus by calculating a vector magnitude and angle for the word which is compared to the contents of a storage dictionary of words. Each cell of storage in the storage dictionary, in addition to containing a unique angle representation of the input word, contains a byte of data representing the valid hyphenation points for the input word. When an input word is verified to be correctly spelled, the hyphenation byte is read out of dictionary and used by the hyphenation section to reassemble the word in hyphenated form. The hyphenated word is then displayed to the operator for appropriate action.

Patent Agency Ranking